162 lines
6.4 KiB
Markdown
162 lines
6.4 KiB
Markdown
# NDI HOME / NOT_HOME classification and country trip segmentation
|
|
|
|
This patch implements the HOME / NOT_HOME classification and the country-trip segmentation described in `docs/ndi_home_classification_en.md`. It reuses the existing driver-working-time pipeline and adds configurable Nominatim reverse geocoding only where source country evidence is missing.
|
|
|
|
## Public processing plan
|
|
|
|
Use:
|
|
|
|
```text
|
|
driver-home-classification-v1
|
|
```
|
|
|
|
The dedicated plan delegates to the shared `driver-working-time-v1` pipeline and explicitly inserts:
|
|
|
|
```text
|
|
support-evidence-normalization
|
|
-> ndi-home-classification
|
|
-> country-trip-segmentation
|
|
-> driving-derived-projections
|
|
```
|
|
|
|
The normal `driver-working-time-v1` plan keeps both modules optional. They can also be requested explicitly as `ndi-home-classification` and `country-trip-segmentation`.
|
|
|
|
## Reused projection structures
|
|
|
|
`DriverWorkingTimeReusableProjectionBuilder.buildAllNonDrivingIntervalCoverage(...)` runs the existing Esper interruption/card-absence/GNSS enrichment pipeline with a zero rest-candidate threshold. It creates enriched evidence for every positive non-driving interruption without changing the legacy daily/weekly-rest threshold or outputs.
|
|
|
|
The implementation reuses `DriverWorkingTimeRestCoverageInterval` as the enriched NDI evidence model. It provides:
|
|
|
|
- previous and next driving/vehicle identities;
|
|
- NDI start, end, and duration;
|
|
- card-absence duration and percentage;
|
|
- begin/end boundary GNSS evidence;
|
|
- boundary odometer and movement evidence.
|
|
|
|
## HOME / NOT_HOME classification
|
|
|
|
The rules are evaluated in the document order:
|
|
|
|
1. previous and next vehicles differ -> `HOME`;
|
|
2. card absent for more than 80% -> `HOME`;
|
|
3. NDI longer than 24 hours -> `HOME`;
|
|
4. no position: NDI longer than 7.5 hours -> `HOME`, otherwise `NOT_HOME`;
|
|
5. positioned long NDI in a company or driver home cluster -> `HOME`;
|
|
6. positioned long NDI outside those clusters -> `NOT_HOME`;
|
|
7. remaining short NDI -> `NOT_HOME`.
|
|
|
|
Every classification contains a `DriverNdiHomeClassificationReason`, so the first matching rule remains visible in the API response.
|
|
|
|
## Location learning and clustering
|
|
|
|
Only NDIs longer than 7.5 hours with a position are added to the corpus. Position selection uses the existing resolved begin-boundary evidence and falls back to resolved end-boundary evidence.
|
|
|
|
The in-memory cache:
|
|
|
|
- accumulates observations across one or more file-session executions;
|
|
- deduplicates the same NDI across repeated/overlapping sessions;
|
|
- retains source-session provenance;
|
|
- stores the driver key on every observation;
|
|
- calculates actual-driver and other-driver views per request.
|
|
|
|
Clustering uses Java DBSCAN with Haversine distance. Defaults are 150 metres and three points. Noise observations remain in the denominator for visit-share calculations but are never home clusters.
|
|
|
|
## Country trip segmentation
|
|
|
|
`DriverCountryTripSegmentationService` builds country segments over driving intervals.
|
|
|
|
Evidence precedence is:
|
|
|
|
1. explicit tachograph border-crossing event (`countryFrom` / `countryTo`);
|
|
2. country code already present on a positioned support event;
|
|
3. Nominatim reverse lookup for a positioned event without a usable country code.
|
|
|
|
Country values are normalized to ISO 3166-1 alpha-2 where a mapping is known. Segment boundaries retain their evidence source:
|
|
|
|
```text
|
|
EXPLICIT_BORDER_CROSSING
|
|
GNSS_SOURCE_COUNTRY_CHANGE
|
|
NOMINATIM_COUNTRY_CHANGE
|
|
VEHICLE_CHANGE
|
|
FINAL
|
|
```
|
|
|
|
The result includes segment counts, explicit-border counts, remote lookup counts, cache-hit counts, unresolved-coordinate counts, warnings, and OpenStreetMap attribution.
|
|
|
|
## Nominatim integration
|
|
|
|
The client uses the reverse endpoint with:
|
|
|
|
```text
|
|
format=jsonv2
|
|
zoom=3
|
|
addressdetails=1
|
|
layer=address
|
|
```
|
|
|
|
Only `address.country_code` is required by the classification/segmentation logic. Failures do not fail the whole processing plan; the coordinate remains unresolved and a diagnostic warning is returned.
|
|
|
|
Safeguards:
|
|
|
|
- identifying configurable `User-Agent`;
|
|
- optional identifying email;
|
|
- shared coordinate cache with TTL and maximum size;
|
|
- coordinate quantization for cache reuse;
|
|
- one execution-level remote lookup budget;
|
|
- fully serialized remote calls;
|
|
- configurable minimum interval;
|
|
- enforced minimum one-second interval for `nominatim.openstreetmap.org`;
|
|
- public OSM endpoint disabled unless deliberately opted in;
|
|
- configurable endpoint so a self-hosted or contracted Nominatim service can be substituted without code changes.
|
|
|
|
### Configuration
|
|
|
|
```yaml
|
|
eventhub:
|
|
reverse-geocoding:
|
|
enabled: true
|
|
provider: NOMINATIM
|
|
nominatim:
|
|
base-url: https://nominatim.openstreetmap.org
|
|
public-service-enabled: false
|
|
user-agent: eventhub-tachograph/0.1 (Nominatim reverse geocoding)
|
|
email: ""
|
|
accept-language: en
|
|
connect-timeout: 10s
|
|
read-timeout: 20s
|
|
minimum-request-interval: 1s
|
|
cache-ttl: 30d
|
|
cache-max-entries: 100000
|
|
coordinate-decimal-places: 4
|
|
max-remote-lookups-per-execution: 25
|
|
```
|
|
|
|
Environment variables use the `NOMINATIM_*` names shown in `application.yml`.
|
|
|
|
For a self-hosted endpoint, set `NOMINATIM_BASE_URL`; `public-service-enabled` is not needed. For deliberately selected, policy-compliant, low-volume use of the donated public endpoint, additionally set:
|
|
|
|
```text
|
|
NOMINATIM_PUBLIC_SERVICE_ENABLED=true
|
|
NOMINATIM_USER_AGENT=<application/version and contact identifier>
|
|
NOMINATIM_EMAIL=<contact email when appropriate>
|
|
```
|
|
|
|
Production or recurring tachograph batch processing should use a self-hosted instance or a provider whose terms cover the expected workload. Coordinates may reveal vehicle or driver movements; do not send confidential or personal-location data to a public endpoint without an appropriate legal and privacy basis.
|
|
|
|
## File-session learning scope
|
|
|
|
The dedicated plan defaults `ndiLearnAllFileSessionDrivers` to `true`. For a request with explicit canonical driver keys, it internally loads all drivers from selected file sessions for location learning and filters the response back to the originally requested drivers.
|
|
|
|
The scope is not broadened when the source is mixed/database-only, the option is disabled, or the result cannot safely be filtered by canonical driver key.
|
|
|
|
## Response extensions
|
|
|
|
Each driver partition can contain:
|
|
|
|
```text
|
|
ndiHomeClassification
|
|
countryTripSegmentation
|
|
```
|
|
|
|
The fields are omitted when their optional modules were not executed, preserving the existing JSON shape for normal `driver-working-time-v1` calls.
|