137 lines
5.2 KiB
Markdown
137 lines
5.2 KiB
Markdown
# NDI HOME / NOT_HOME classification implementation
|
|
|
|
This patch implements the HOME / NOT_HOME part of `docs/ndi_home_classification_en.md` as a dedicated runtime processing plan while reusing the existing driver-working-time pipeline.
|
|
|
|
## Public processing plan
|
|
|
|
Use:
|
|
|
|
```text
|
|
driver-home-classification-v1
|
|
```
|
|
|
|
The plan delegates to the shared `driver-working-time-v1` pipeline and explicitly inserts:
|
|
|
|
```text
|
|
support-evidence-normalization
|
|
-> ndi-home-classification
|
|
-> driving-derived-projections
|
|
```
|
|
|
|
The original `driver-working-time-v1` plan does not run the optional NDI module by default. It can opt in by explicitly requesting `ndi-home-classification`.
|
|
|
|
## Reused projection structures
|
|
|
|
`DriverWorkingTimeReusableProjectionBuilder.buildAllNonDrivingIntervalCoverage(...)` runs the existing Esper interruption/card-absence/GNSS enrichment pipeline with a zero rest-candidate threshold. It therefore creates enriched evidence for every positive non-driving interruption without changing the legacy daily/weekly-rest threshold or outputs.
|
|
|
|
The implementation reuses `DriverWorkingTimeRestCoverageInterval` as the enriched NDI evidence model. It provides:
|
|
|
|
- previous and next driving/vehicle identities;
|
|
- NDI start, end, and duration;
|
|
- card-absence duration and percentage;
|
|
- begin/end boundary GNSS evidence;
|
|
- boundary odometer and movement evidence.
|
|
|
|
## Implemented classification rules
|
|
|
|
The rules are evaluated in the document order:
|
|
|
|
1. previous and next vehicles differ -> `HOME`;
|
|
2. card absent for more than 80% -> `HOME`;
|
|
3. NDI longer than 24 hours -> `HOME`;
|
|
4. no position: NDI longer than 7.5 hours -> `HOME`, otherwise `NOT_HOME`;
|
|
5. positioned long NDI in a company or driver home cluster -> `HOME`;
|
|
6. positioned long NDI outside those clusters -> `NOT_HOME`;
|
|
7. remaining short NDI -> `NOT_HOME`.
|
|
|
|
Every classification contains a `DriverNdiHomeClassificationReason`, so the first matching rule remains visible in the API response.
|
|
|
|
## Location learning and clustering
|
|
|
|
Only NDIs longer than 7.5 hours with a position are added to the corpus.
|
|
|
|
Position selection follows the document through the existing boundary-evidence resolver:
|
|
|
|
```text
|
|
resolved begin-boundary evidence for the previous driving/vehicle context,
|
|
otherwise resolved end-boundary evidence for the next driving/vehicle context
|
|
```
|
|
|
|
The selected evidence is the closest eligible support-position event within the configured boundary lookup window, so it is an approximation when no event exists exactly at the driving boundary.
|
|
|
|
The in-memory cache:
|
|
|
|
- accumulates observations across one or more file-session executions;
|
|
- deduplicates the same NDI across repeated/overlapping sessions;
|
|
- retains the source session IDs as provenance;
|
|
- stores the driver key on every observation;
|
|
- does not permanently mark a driver as "actual" or "other".
|
|
|
|
For each result driver, the same cached corpus is viewed as:
|
|
|
|
```text
|
|
actual-driver observations
|
|
other-driver observations
|
|
```
|
|
|
|
This makes the distinction request-relative and allows the corpus to be reused for another driver.
|
|
|
|
Clustering uses Java DBSCAN with Haversine distance. Defaults are 150 metres and three points. Noise observations remain in the denominator for visit-share calculations but are never home clusters.
|
|
|
|
## File-session learning scope
|
|
|
|
The dedicated plan defaults `ndiLearnAllFileSessionDrivers` to `true`.
|
|
|
|
For a request with explicit canonical driver keys, the plan internally loads all drivers from the selected file sessions for location learning and filters the response back to the originally requested drivers.
|
|
|
|
The scope is not broadened when:
|
|
|
|
- the source selection is mixed or database-only;
|
|
- the option is disabled;
|
|
- the request uses only alternate card/source selectors and cannot be filtered safely by canonical driver key.
|
|
|
|
## Configuration
|
|
|
|
The defaults are under:
|
|
|
|
```yaml
|
|
eventhub:
|
|
tachograph-file-session:
|
|
processing:
|
|
ndi-long-minutes: 450
|
|
ndi-very-long-minutes: 1440
|
|
ndi-card-removal-percent: 80
|
|
ndi-visit-share-percent: 25
|
|
ndi-dbscan-eps-meters: 150
|
|
ndi-dbscan-min-points: 3
|
|
ndi-location-cache-ttl: 4h
|
|
ndi-location-cache-max-observations: 100000
|
|
ndi-location-cache-namespace: default
|
|
```
|
|
|
|
For tenantless uploaded sessions, configure a namespace that prevents unrelated operational contexts from sharing a corpus. Explicit tenant keys always create tenant-scoped corpora.
|
|
|
|
## Response extension
|
|
|
|
Each driver partition can now contain:
|
|
|
|
```text
|
|
ndiHomeClassification
|
|
```
|
|
|
|
It includes:
|
|
|
|
- all NDI classifications;
|
|
- company and driver home cluster IDs;
|
|
- cluster centroids and visit statistics;
|
|
- actual-driver versus other-driver cached observation counts;
|
|
- diagnostics and notes.
|
|
|
|
The field is omitted when the optional module was not executed, preserving the existing JSON shape for normal `driver-working-time-v1` calls.
|
|
|
|
## Current implementation boundary
|
|
|
|
This patch implements sections 1-4 of the document: NDI derivation/enrichment, location clustering, home-location determination, and HOME / NOT_HOME classification.
|
|
|
|
Section 5, border-crossing/country trip segmentation, is intentionally not included yet. It needs a separate country-resolution abstraction and a decision between local geographic data, PostGIS, or an external reverse-geocoding provider.
|