eventhub/README_NDI_HOME_CLASSIFICAT...

5.2 KiB

NDI HOME / NOT_HOME classification implementation

This patch implements the HOME / NOT_HOME part of docs/ndi_home_classification_en.md as a dedicated runtime processing plan while reusing the existing driver-working-time pipeline.

Public processing plan

Use:

driver-home-classification-v1

The plan delegates to the shared driver-working-time-v1 pipeline and explicitly inserts:

support-evidence-normalization
-> ndi-home-classification
-> driving-derived-projections

The original driver-working-time-v1 plan does not run the optional NDI module by default. It can opt in by explicitly requesting ndi-home-classification.

Reused projection structures

DriverWorkingTimeReusableProjectionBuilder.buildAllNonDrivingIntervalCoverage(...) runs the existing Esper interruption/card-absence/GNSS enrichment pipeline with a zero rest-candidate threshold. It therefore creates enriched evidence for every positive non-driving interruption without changing the legacy daily/weekly-rest threshold or outputs.

The implementation reuses DriverWorkingTimeRestCoverageInterval as the enriched NDI evidence model. It provides:

  • previous and next driving/vehicle identities;
  • NDI start, end, and duration;
  • card-absence duration and percentage;
  • begin/end boundary GNSS evidence;
  • boundary odometer and movement evidence.

Implemented classification rules

The rules are evaluated in the document order:

  1. previous and next vehicles differ -> HOME;
  2. card absent for more than 80% -> HOME;
  3. NDI longer than 24 hours -> HOME;
  4. no position: NDI longer than 7.5 hours -> HOME, otherwise NOT_HOME;
  5. positioned long NDI in a company or driver home cluster -> HOME;
  6. positioned long NDI outside those clusters -> NOT_HOME;
  7. remaining short NDI -> NOT_HOME.

Every classification contains a DriverNdiHomeClassificationReason, so the first matching rule remains visible in the API response.

Location learning and clustering

Only NDIs longer than 7.5 hours with a position are added to the corpus.

Position selection follows the document through the existing boundary-evidence resolver:

resolved begin-boundary evidence for the previous driving/vehicle context,
otherwise resolved end-boundary evidence for the next driving/vehicle context

The selected evidence is the closest eligible support-position event within the configured boundary lookup window, so it is an approximation when no event exists exactly at the driving boundary.

The in-memory cache:

  • accumulates observations across one or more file-session executions;
  • deduplicates the same NDI across repeated/overlapping sessions;
  • retains the source session IDs as provenance;
  • stores the driver key on every observation;
  • does not permanently mark a driver as "actual" or "other".

For each result driver, the same cached corpus is viewed as:

actual-driver observations
other-driver observations

This makes the distinction request-relative and allows the corpus to be reused for another driver.

Clustering uses Java DBSCAN with Haversine distance. Defaults are 150 metres and three points. Noise observations remain in the denominator for visit-share calculations but are never home clusters.

File-session learning scope

The dedicated plan defaults ndiLearnAllFileSessionDrivers to true.

For a request with explicit canonical driver keys, the plan internally loads all drivers from the selected file sessions for location learning and filters the response back to the originally requested drivers.

The scope is not broadened when:

  • the source selection is mixed or database-only;
  • the option is disabled;
  • the request uses only alternate card/source selectors and cannot be filtered safely by canonical driver key.

Configuration

The defaults are under:

eventhub:
  tachograph-file-session:
    processing:
      ndi-long-minutes: 450
      ndi-very-long-minutes: 1440
      ndi-card-removal-percent: 80
      ndi-visit-share-percent: 25
      ndi-dbscan-eps-meters: 150
      ndi-dbscan-min-points: 3
      ndi-location-cache-ttl: 4h
      ndi-location-cache-max-observations: 100000
      ndi-location-cache-namespace: default

For tenantless uploaded sessions, configure a namespace that prevents unrelated operational contexts from sharing a corpus. Explicit tenant keys always create tenant-scoped corpora.

Response extension

Each driver partition can now contain:

ndiHomeClassification

It includes:

  • all NDI classifications;
  • company and driver home cluster IDs;
  • cluster centroids and visit statistics;
  • actual-driver versus other-driver cached observation counts;
  • diagnostics and notes.

The field is omitted when the optional module was not executed, preserving the existing JSON shape for normal driver-working-time-v1 calls.

Current implementation boundary

This patch implements sections 1-4 of the document: NDI derivation/enrichment, location clustering, home-location determination, and HOME / NOT_HOME classification.

Section 5, border-crossing/country trip segmentation, is intentionally not included yet. It needs a separate country-resolution abstraction and a decision between local geographic data, PostGIS, or an external reverse-geocoding provider.