8.4 KiB
NDI HOME / NOT_HOME classification and country trip segmentation
This patch implements the HOME / NOT_HOME classification and the country-trip segmentation described in docs/ndi_home_classification_en.md. It reuses the existing driver-working-time pipeline and adds configurable Nominatim reverse geocoding only where source country evidence is missing.
Public processing plan
Use:
driver-home-classification-v1
The dedicated plan delegates to the shared driver-working-time-v1 pipeline and explicitly inserts:
support-evidence-normalization
-> ndi-home-classification
-> country-trip-segmentation
-> driving-derived-projections
The normal driver-working-time-v1 plan keeps both modules optional. They can also be requested explicitly as ndi-home-classification and country-trip-segmentation.
Reused projection structures
DriverWorkingTimeReusableProjectionBuilder.buildAllNonDrivingIntervalCoverage(...) runs the existing Esper interruption/card-absence/GNSS enrichment pipeline with a zero rest-candidate threshold. It creates enriched evidence for every positive non-driving interruption without changing the legacy daily/weekly-rest threshold or outputs.
The implementation reuses DriverWorkingTimeRestCoverageInterval as the enriched NDI evidence model. It provides:
- previous and next driving/vehicle identities;
- NDI start, end, and duration;
- card-absence duration and percentage;
- begin/end boundary GNSS evidence;
- boundary odometer and movement evidence.
HOME / NOT_HOME classification
The rules are evaluated in the document order:
- previous and next vehicles differ ->
HOME; - card absent for more than 80% ->
HOME; - NDI longer than 24 hours ->
HOME; - no position: NDI longer than 7.5 hours ->
HOME, otherwiseNOT_HOME; - positioned long NDI in a company or driver home cluster ->
HOME; - positioned long NDI outside those clusters ->
NOT_HOME; - remaining short NDI ->
NOT_HOME.
Every classification contains a DriverNdiHomeClassificationReason, so the first matching rule remains visible in the API response.
Location learning and clustering
Only NDIs longer than 7.5 hours with a position are added to the corpus. Position selection uses the existing resolved begin-boundary evidence and falls back to resolved end-boundary evidence.
The in-memory cache:
- accumulates observations across one or more file-session executions;
- deduplicates the same NDI across repeated/overlapping sessions;
- retains source-session provenance;
- stores the driver key on every observation;
- calculates actual-driver and other-driver views per request.
Clustering uses Java DBSCAN with Haversine distance. Defaults are 150 metres and three points. Noise observations remain in the denominator for visit-share calculations but are never home clusters.
HOME-to-HOME trips and country segmentation
A complete trip is now defined by two consecutive HOME classifications:
start HOME NDI --active trip time and contained NOT_HOME NDIs--> end HOME NDI
The active trip starts at the end timestamp of the start HOME NDI and ends at the start timestamp of the end HOME NDI. The result retains both HOME classifications as boundary evidence and attaches every NOT_HOME classification fully contained between those boundaries. A HOME classification can close one trip and simultaneously become the start boundary of the next trip.
Data before the first HOME classification and after the last HOME classification is not emitted as a complete trip. The number of NOT_HOME classifications outside complete trips is returned as unassignedNonHomeClassificationCount.
Every DriverClassifiedTrip contains:
- deterministic
tripId; - active
startedAt,endedAt, and duration; startHomeClassification;endHomeClassification;containedNonHomeClassifications;- driving-interval count;
- country segments calculated only inside that trip.
DriverCountryTripSegmentationService calculates country segments independently for each complete trip. The flat segments list remains available for compatibility, but is now the concatenation of all per-trip segments. Every segment contains its owning tripId and an explicit countryCode.
Evidence precedence is:
- explicit tachograph border-crossing event (
countryFrom/countryTo); - country code already present on a positioned support event;
- Nominatim reverse lookup for a positioned event without a usable country code.
Country values are normalized to ISO 3166-1 alpha-2 where a mapping is known. Segment boundaries retain their evidence source:
EXPLICIT_BORDER_CROSSING
GNSS_SOURCE_COUNTRY_CHANGE
NOMINATIM_COUNTRY_CHANGE
VEHICLE_CHANGE
FINAL
The result includes trip and segment counts, unassigned classification counts, explicit-border counts, remote lookup counts, cache-hit counts, unresolved-coordinate counts, warnings, and OpenStreetMap attribution.
Nominatim integration
The client uses the reverse endpoint with:
format=jsonv2
zoom=3
addressdetails=1
layer=address
Only address.country_code is required by the classification/segmentation logic. Failures do not fail the whole processing plan; the coordinate remains unresolved and a diagnostic warning is returned.
Safeguards:
- identifying configurable
User-Agent; - optional identifying email;
- shared coordinate cache with TTL and maximum size;
- coordinate quantization for cache reuse;
- one execution-level remote lookup budget;
- fully serialized remote calls;
- configurable minimum interval;
- enforced minimum one-second interval for
nominatim.openstreetmap.org; - public OSM endpoint disabled unless deliberately opted in;
- configurable endpoint so a self-hosted or contracted Nominatim service can be substituted without code changes.
Configuration
eventhub:
reverse-geocoding:
enabled: true
provider: NOMINATIM
nominatim:
base-url: https://nominatim.openstreetmap.org
public-service-enabled: false
user-agent: eventhub-tachograph/0.1 (Nominatim reverse geocoding)
email: ""
accept-language: en
connect-timeout: 10s
read-timeout: 20s
minimum-request-interval: 1s
cache-ttl: 30d
cache-max-entries: 100000
coordinate-decimal-places: 4
max-remote-lookups-per-execution: 25
Environment variables use the NOMINATIM_* names shown in application.yml.
For a self-hosted endpoint, set NOMINATIM_BASE_URL; public-service-enabled is not needed. For deliberately selected, policy-compliant, low-volume use of the donated public endpoint, additionally set:
NOMINATIM_PUBLIC_SERVICE_ENABLED=true
NOMINATIM_USER_AGENT=<application/version and contact identifier>
NOMINATIM_EMAIL=<contact email when appropriate>
Production or recurring tachograph batch processing should use a self-hosted instance or a provider whose terms cover the expected workload. Coordinates may reveal vehicle or driver movements; do not send confidential or personal-location data to a public endpoint without an appropriate legal and privacy basis.
File-session learning scope
The dedicated plan defaults ndiLearnAllFileSessionDrivers to true. For a request with explicit canonical driver keys, it internally loads all drivers from selected file sessions for location learning and filters the response back to the originally requested drivers.
The scope is not broadened when the source is mixed/database-only, the option is disabled, or the result cannot safely be filtered by canonical driver key.
Response extensions
Each driver partition can contain:
ndiHomeClassification
countryTripSegmentation
The fields are omitted when their optional modules were not executed, preserving the existing JSON shape for normal driver-working-time-v1 calls.
Trip response shape
{
"countryTripSegmentation": {
"tripCount": 1,
"unassignedNonHomeClassificationCount": 0,
"trips": [
{
"tripId": "DRIVER_TRIP|...",
"startedAt": "2026-05-01T08:00:00Z",
"endedAt": "2026-05-03T18:00:00Z",
"startHomeClassification": { "intervalId": "NDI-START", "status": "HOME" },
"endHomeClassification": { "intervalId": "NDI-END", "status": "HOME" },
"containedNonHomeClassifications": [
{ "intervalId": "NDI-AWAY-REST", "status": "NOT_HOME" }
],
"countrySegments": [
{ "tripId": "DRIVER_TRIP|...", "countryCode": "AT" },
{ "tripId": "DRIVER_TRIP|...", "countryCode": "DE" }
]
}
]
}
}