eventhub/README_NDI_HOME_CLASSIFICAT...

6.4 KiB

NDI HOME / NOT_HOME classification and country trip segmentation

This patch implements the HOME / NOT_HOME classification and the country-trip segmentation described in docs/ndi_home_classification_en.md. It reuses the existing driver-working-time pipeline and adds configurable Nominatim reverse geocoding only where source country evidence is missing.

Public processing plan

Use:

driver-home-classification-v1

The dedicated plan delegates to the shared driver-working-time-v1 pipeline and explicitly inserts:

support-evidence-normalization
-> ndi-home-classification
-> country-trip-segmentation
-> driving-derived-projections

The normal driver-working-time-v1 plan keeps both modules optional. They can also be requested explicitly as ndi-home-classification and country-trip-segmentation.

Reused projection structures

DriverWorkingTimeReusableProjectionBuilder.buildAllNonDrivingIntervalCoverage(...) runs the existing Esper interruption/card-absence/GNSS enrichment pipeline with a zero rest-candidate threshold. It creates enriched evidence for every positive non-driving interruption without changing the legacy daily/weekly-rest threshold or outputs.

The implementation reuses DriverWorkingTimeRestCoverageInterval as the enriched NDI evidence model. It provides:

  • previous and next driving/vehicle identities;
  • NDI start, end, and duration;
  • card-absence duration and percentage;
  • begin/end boundary GNSS evidence;
  • boundary odometer and movement evidence.

HOME / NOT_HOME classification

The rules are evaluated in the document order:

  1. previous and next vehicles differ -> HOME;
  2. card absent for more than 80% -> HOME;
  3. NDI longer than 24 hours -> HOME;
  4. no position: NDI longer than 7.5 hours -> HOME, otherwise NOT_HOME;
  5. positioned long NDI in a company or driver home cluster -> HOME;
  6. positioned long NDI outside those clusters -> NOT_HOME;
  7. remaining short NDI -> NOT_HOME.

Every classification contains a DriverNdiHomeClassificationReason, so the first matching rule remains visible in the API response.

Location learning and clustering

Only NDIs longer than 7.5 hours with a position are added to the corpus. Position selection uses the existing resolved begin-boundary evidence and falls back to resolved end-boundary evidence.

The in-memory cache:

  • accumulates observations across one or more file-session executions;
  • deduplicates the same NDI across repeated/overlapping sessions;
  • retains source-session provenance;
  • stores the driver key on every observation;
  • calculates actual-driver and other-driver views per request.

Clustering uses Java DBSCAN with Haversine distance. Defaults are 150 metres and three points. Noise observations remain in the denominator for visit-share calculations but are never home clusters.

Country trip segmentation

DriverCountryTripSegmentationService builds country segments over driving intervals.

Evidence precedence is:

  1. explicit tachograph border-crossing event (countryFrom / countryTo);
  2. country code already present on a positioned support event;
  3. Nominatim reverse lookup for a positioned event without a usable country code.

Country values are normalized to ISO 3166-1 alpha-2 where a mapping is known. Segment boundaries retain their evidence source:

EXPLICIT_BORDER_CROSSING
GNSS_SOURCE_COUNTRY_CHANGE
NOMINATIM_COUNTRY_CHANGE
VEHICLE_CHANGE
FINAL

The result includes segment counts, explicit-border counts, remote lookup counts, cache-hit counts, unresolved-coordinate counts, warnings, and OpenStreetMap attribution.

Nominatim integration

The client uses the reverse endpoint with:

format=jsonv2
zoom=3
addressdetails=1
layer=address

Only address.country_code is required by the classification/segmentation logic. Failures do not fail the whole processing plan; the coordinate remains unresolved and a diagnostic warning is returned.

Safeguards:

  • identifying configurable User-Agent;
  • optional identifying email;
  • shared coordinate cache with TTL and maximum size;
  • coordinate quantization for cache reuse;
  • one execution-level remote lookup budget;
  • fully serialized remote calls;
  • configurable minimum interval;
  • enforced minimum one-second interval for nominatim.openstreetmap.org;
  • public OSM endpoint disabled unless deliberately opted in;
  • configurable endpoint so a self-hosted or contracted Nominatim service can be substituted without code changes.

Configuration

eventhub:
  reverse-geocoding:
    enabled: true
    provider: NOMINATIM
    nominatim:
      base-url: https://nominatim.openstreetmap.org
      public-service-enabled: false
      user-agent: eventhub-tachograph/0.1 (Nominatim reverse geocoding)
      email: ""
      accept-language: en
      connect-timeout: 10s
      read-timeout: 20s
      minimum-request-interval: 1s
      cache-ttl: 30d
      cache-max-entries: 100000
      coordinate-decimal-places: 4
      max-remote-lookups-per-execution: 25

Environment variables use the NOMINATIM_* names shown in application.yml.

For a self-hosted endpoint, set NOMINATIM_BASE_URL; public-service-enabled is not needed. For deliberately selected, policy-compliant, low-volume use of the donated public endpoint, additionally set:

NOMINATIM_PUBLIC_SERVICE_ENABLED=true
NOMINATIM_USER_AGENT=<application/version and contact identifier>
NOMINATIM_EMAIL=<contact email when appropriate>

Production or recurring tachograph batch processing should use a self-hosted instance or a provider whose terms cover the expected workload. Coordinates may reveal vehicle or driver movements; do not send confidential or personal-location data to a public endpoint without an appropriate legal and privacy basis.

File-session learning scope

The dedicated plan defaults ndiLearnAllFileSessionDrivers to true. For a request with explicit canonical driver keys, it internally loads all drivers from selected file sessions for location learning and filters the response back to the originally requested drivers.

The scope is not broadened when the source is mixed/database-only, the option is disabled, or the result cannot safely be filtered by canonical driver key.

Response extensions

Each driver partition can contain:

ndiHomeClassification
countryTripSegmentation

The fields are omitted when their optional modules were not executed, preserving the existing JSON shape for normal driver-working-time-v1 calls.