209 lines
8.4 KiB
Markdown
209 lines
8.4 KiB
Markdown
# NDI HOME / NOT_HOME classification and country trip segmentation
|
|
|
|
This patch implements the HOME / NOT_HOME classification and the country-trip segmentation described in `docs/ndi_home_classification_en.md`. It reuses the existing driver-working-time pipeline and adds configurable Nominatim reverse geocoding only where source country evidence is missing.
|
|
|
|
## Public processing plan
|
|
|
|
Use:
|
|
|
|
```text
|
|
driver-home-classification-v1
|
|
```
|
|
|
|
The dedicated plan delegates to the shared `driver-working-time-v1` pipeline and explicitly inserts:
|
|
|
|
```text
|
|
support-evidence-normalization
|
|
-> ndi-home-classification
|
|
-> country-trip-segmentation
|
|
-> driving-derived-projections
|
|
```
|
|
|
|
The normal `driver-working-time-v1` plan keeps both modules optional. They can also be requested explicitly as `ndi-home-classification` and `country-trip-segmentation`.
|
|
|
|
## Reused projection structures
|
|
|
|
`DriverWorkingTimeReusableProjectionBuilder.buildAllNonDrivingIntervalCoverage(...)` runs the existing Esper interruption/card-absence/GNSS enrichment pipeline with a zero rest-candidate threshold. It creates enriched evidence for every positive non-driving interruption without changing the legacy daily/weekly-rest threshold or outputs.
|
|
|
|
The implementation reuses `DriverWorkingTimeRestCoverageInterval` as the enriched NDI evidence model. It provides:
|
|
|
|
- previous and next driving/vehicle identities;
|
|
- NDI start, end, and duration;
|
|
- card-absence duration and percentage;
|
|
- begin/end boundary GNSS evidence;
|
|
- boundary odometer and movement evidence.
|
|
|
|
## HOME / NOT_HOME classification
|
|
|
|
The rules are evaluated in the document order:
|
|
|
|
1. previous and next vehicles differ -> `HOME`;
|
|
2. card absent for more than 80% -> `HOME`;
|
|
3. NDI longer than 24 hours -> `HOME`;
|
|
4. no position: NDI longer than 7.5 hours -> `HOME`, otherwise `NOT_HOME`;
|
|
5. positioned long NDI in a company or driver home cluster -> `HOME`;
|
|
6. positioned long NDI outside those clusters -> `NOT_HOME`;
|
|
7. remaining short NDI -> `NOT_HOME`.
|
|
|
|
Every classification contains a `DriverNdiHomeClassificationReason`, so the first matching rule remains visible in the API response.
|
|
|
|
## Location learning and clustering
|
|
|
|
Only NDIs longer than 7.5 hours with a position are added to the corpus. Position selection uses the existing resolved begin-boundary evidence and falls back to resolved end-boundary evidence.
|
|
|
|
The in-memory cache:
|
|
|
|
- accumulates observations across one or more file-session executions;
|
|
- deduplicates the same NDI across repeated/overlapping sessions;
|
|
- retains source-session provenance;
|
|
- stores the driver key on every observation;
|
|
- calculates actual-driver and other-driver views per request.
|
|
|
|
Clustering uses Java DBSCAN with Haversine distance. Defaults are 150 metres and three points. Noise observations remain in the denominator for visit-share calculations but are never home clusters.
|
|
|
|
## HOME-to-HOME trips and country segmentation
|
|
|
|
A complete trip is now defined by two consecutive `HOME` classifications:
|
|
|
|
```text
|
|
start HOME NDI --active trip time and contained NOT_HOME NDIs--> end HOME NDI
|
|
```
|
|
|
|
The active trip starts at the end timestamp of the start HOME NDI and ends at the start timestamp of the end HOME NDI. The result retains both HOME classifications as boundary evidence and attaches every `NOT_HOME` classification fully contained between those boundaries. A HOME classification can close one trip and simultaneously become the start boundary of the next trip.
|
|
|
|
Data before the first HOME classification and after the last HOME classification is not emitted as a complete trip. The number of `NOT_HOME` classifications outside complete trips is returned as `unassignedNonHomeClassificationCount`.
|
|
|
|
Every `DriverClassifiedTrip` contains:
|
|
|
|
- deterministic `tripId`;
|
|
- active `startedAt`, `endedAt`, and duration;
|
|
- `startHomeClassification`;
|
|
- `endHomeClassification`;
|
|
- `containedNonHomeClassifications`;
|
|
- driving-interval count;
|
|
- country segments calculated only inside that trip.
|
|
|
|
`DriverCountryTripSegmentationService` calculates country segments independently for each complete trip. The flat `segments` list remains available for compatibility, but is now the concatenation of all per-trip segments. Every segment contains its owning `tripId` and an explicit `countryCode`.
|
|
|
|
Evidence precedence is:
|
|
|
|
1. explicit tachograph border-crossing event (`countryFrom` / `countryTo`);
|
|
2. country code already present on a positioned support event;
|
|
3. Nominatim reverse lookup for a positioned event without a usable country code.
|
|
|
|
Country values are normalized to ISO 3166-1 alpha-2 where a mapping is known. Segment boundaries retain their evidence source:
|
|
|
|
```text
|
|
EXPLICIT_BORDER_CROSSING
|
|
GNSS_SOURCE_COUNTRY_CHANGE
|
|
NOMINATIM_COUNTRY_CHANGE
|
|
VEHICLE_CHANGE
|
|
FINAL
|
|
```
|
|
|
|
The result includes trip and segment counts, unassigned classification counts, explicit-border counts, remote lookup counts, cache-hit counts, unresolved-coordinate counts, warnings, and OpenStreetMap attribution.
|
|
|
|
## Nominatim integration
|
|
|
|
The client uses the reverse endpoint with:
|
|
|
|
```text
|
|
format=jsonv2
|
|
zoom=3
|
|
addressdetails=1
|
|
layer=address
|
|
```
|
|
|
|
Only `address.country_code` is required by the classification/segmentation logic. Failures do not fail the whole processing plan; the coordinate remains unresolved and a diagnostic warning is returned.
|
|
|
|
Safeguards:
|
|
|
|
- identifying configurable `User-Agent`;
|
|
- optional identifying email;
|
|
- shared coordinate cache with TTL and maximum size;
|
|
- coordinate quantization for cache reuse;
|
|
- one execution-level remote lookup budget;
|
|
- fully serialized remote calls;
|
|
- configurable minimum interval;
|
|
- enforced minimum one-second interval for `nominatim.openstreetmap.org`;
|
|
- public OSM endpoint disabled unless deliberately opted in;
|
|
- configurable endpoint so a self-hosted or contracted Nominatim service can be substituted without code changes.
|
|
|
|
### Configuration
|
|
|
|
```yaml
|
|
eventhub:
|
|
reverse-geocoding:
|
|
enabled: true
|
|
provider: NOMINATIM
|
|
nominatim:
|
|
base-url: https://nominatim.openstreetmap.org
|
|
public-service-enabled: false
|
|
user-agent: eventhub-tachograph/0.1 (Nominatim reverse geocoding)
|
|
email: ""
|
|
accept-language: en
|
|
connect-timeout: 10s
|
|
read-timeout: 20s
|
|
minimum-request-interval: 1s
|
|
cache-ttl: 30d
|
|
cache-max-entries: 100000
|
|
coordinate-decimal-places: 4
|
|
max-remote-lookups-per-execution: 25
|
|
```
|
|
|
|
Environment variables use the `NOMINATIM_*` names shown in `application.yml`.
|
|
|
|
For a self-hosted endpoint, set `NOMINATIM_BASE_URL`; `public-service-enabled` is not needed. For deliberately selected, policy-compliant, low-volume use of the donated public endpoint, additionally set:
|
|
|
|
```text
|
|
NOMINATIM_PUBLIC_SERVICE_ENABLED=true
|
|
NOMINATIM_USER_AGENT=<application/version and contact identifier>
|
|
NOMINATIM_EMAIL=<contact email when appropriate>
|
|
```
|
|
|
|
Production or recurring tachograph batch processing should use a self-hosted instance or a provider whose terms cover the expected workload. Coordinates may reveal vehicle or driver movements; do not send confidential or personal-location data to a public endpoint without an appropriate legal and privacy basis.
|
|
|
|
## File-session learning scope
|
|
|
|
The dedicated plan defaults `ndiLearnAllFileSessionDrivers` to `true`. For a request with explicit canonical driver keys, it internally loads all drivers from selected file sessions for location learning and filters the response back to the originally requested drivers.
|
|
|
|
The scope is not broadened when the source is mixed/database-only, the option is disabled, or the result cannot safely be filtered by canonical driver key.
|
|
|
|
## Response extensions
|
|
|
|
Each driver partition can contain:
|
|
|
|
```text
|
|
ndiHomeClassification
|
|
countryTripSegmentation
|
|
```
|
|
|
|
The fields are omitted when their optional modules were not executed, preserving the existing JSON shape for normal `driver-working-time-v1` calls.
|
|
|
|
### Trip response shape
|
|
|
|
```json
|
|
{
|
|
"countryTripSegmentation": {
|
|
"tripCount": 1,
|
|
"unassignedNonHomeClassificationCount": 0,
|
|
"trips": [
|
|
{
|
|
"tripId": "DRIVER_TRIP|...",
|
|
"startedAt": "2026-05-01T08:00:00Z",
|
|
"endedAt": "2026-05-03T18:00:00Z",
|
|
"startHomeClassification": { "intervalId": "NDI-START", "status": "HOME" },
|
|
"endHomeClassification": { "intervalId": "NDI-END", "status": "HOME" },
|
|
"containedNonHomeClassifications": [
|
|
{ "intervalId": "NDI-AWAY-REST", "status": "NOT_HOME" }
|
|
],
|
|
"countrySegments": [
|
|
{ "tripId": "DRIVER_TRIP|...", "countryCode": "AT" },
|
|
{ "tripId": "DRIVER_TRIP|...", "countryCode": "DE" }
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|