eventhub/docs/ndi_home_classification_ana...

14 KiB
Raw Blame History

1. Terminology and source data

The algorithm uses data from:

  • M_: vehicle-unit tachograph data, especially GNSS positions.
  • C_: driver-card data, especially activities and card insertion/removal events.

The main objects are:

DI — Driving Interval

A continuous interval in which the driver is driving.

It contains:

  • driver
  • vehicle
  • start and end time
  • start and end positions
  • optional GNSS trace points

NDI — Non-Driving Interval

The gap between two consecutive driving intervals.

It contains:

  • driver
  • vehicle before the interval
  • vehicle after the interval
  • start and end time
  • inferred position
  • card-removal interval
  • location cluster
  • classification: HOME or NOT_HOME

In this specification, HOME does not necessarily mean the drivers private residence. It means that the interruption is treated as a home/base interruption. It can also represent:

  • a company depot,
  • a vehicle change,
  • a long card-removal period,
  • or simply a rest longer than 24 hours.

2. How an NDI is built

Driving intervals are grouped by driver and ordered chronologically.

For every pair of consecutive driving intervals:

previous driving interval
        ↓
non-driving interval
        ↓
next driving interval

The NDI is created as follows:

NDI.start = previous DI.end
NDI.end   = next DI.start

The vehicles are:

NDI.vehicleStart = previous DI.vehicleId
NDI.vehicleEnd   = next DI.vehicleId

The position is selected using:

previous DI end position
otherwise
next DI start position
otherwise
no position

So the exact position rule is:

NDI.pos = previous.posEnd ?? next.posStart

The driver-card events between the two driving intervals are used to find the card-removal interval.

Important limitations

The algorithm creates NDIs only between consecutive driving intervals. It does not create:

  • an NDI before the first driving interval,
  • an NDI after the final driving interval,
  • or an NDI when only card/activity data exists without surrounding driving intervals.

3. How home locations are learned

Only NDIs satisfying both conditions are used for location learning:

duration > 7.5 hours
position is known

Exactly 7.5 hours does not qualify because the comparison is strictly >.

3.1 Location clustering

The positions are clustered globally with DBSCAN:

Parameter Value
Maximum cluster distance 150 metres
Minimum points 3
Distance calculation Haversine/PostGIS
Unclustered points NOISE

All drivers qualifying NDIs appear to be clustered together in one global clustering operation.

3.2 Company-home locations

A cluster becomes a company-home location, normally interpreted as a depot, when:

visits to cluster / all long positioned NDIs > 25%

The denominator is all NDIs in the complete dataset that:

  • are longer than 7.5 hours, and
  • have a position.

Noise points are excluded as possible company-home clusters.

3.3 Driver-home locations

For each driver, a cluster becomes a private driver-home location when:

driver's visits to cluster /
driver's long positioned NDIs > 25%

Additionally:

cluster must not already be a company-home cluster

Therefore, a location used frequently by the whole company is classified as a company depot rather than a private driver home.

Exact threshold behaviour

The comparisons are strict:

  • exactly 25% is not enough;
  • the share must be greater than 25%.

4. Rules for determining whether an NDI is HOME or NOT_HOME

The rules are evaluated in a fixed priority order. The first matching rule wins.

Decision table

Priority Condition Result
A Vehicle before NDI differs from vehicle after NDI HOME
B Card is removed for more than 80% of the NDI HOME
C NDI duration is more than 24 hours HOME
D1 Position unknown and duration more than 7.5 hours HOME
D2 Position unknown and duration no more than 7.5 hours NOT_HOME
E1 Position known, duration more than 7.5 hours, and position belongs to company-home or driver-home cluster HOME
E2 Position known, duration more than 7.5 hours, but position is not a recognised home cluster NOT_HOME
F All remaining short NDIs NOT_HOME

Rule A: Change of vehicle

vehicleStart != vehicleEnd → HOME

When the next driving interval starts in another vehicle, the NDI is always treated as HOME.

This rule has the highest priority. It applies even when:

  • the NDI is short,
  • the position is known to be away from home,
  • or the driver card remained inserted for much of the interval.

This is therefore a technical trip-separation rule rather than proof that the driver was physically at home.

Rule B: Card removed for more than 80%

cardOut duration > 80% of NDI duration → HOME

Exactly 80% is not sufficient.

Example:

NDI duration:       10 hours
Card removed:        8 hours
Result:              NOT enough for Rule B

Card removed:        8 hours 1 minute
Result:              HOME

The data model contains only one cardOut interval. It is not defined how several card-removal periods inside one NDI should be combined.

Rule C: NDI longer than 24 hours

NDI duration > 24 hours → HOME

Exactly 24 hours is not sufficient.

This rule overrides the position logic. Even when the driver remains at an unrecognised remote location, an NDI longer than 24 hours is classified as HOME.

Rule D: No known position

When the position cannot be determined:

duration > 7.5 hours → HOME
duration <= 7.5 hours → NOT_HOME

This is a fallback assumption. A long rest without location evidence is assumed to be home.

Rule E: Long NDI with a known position

For an NDI longer than 7.5 hours:

position in company-home cluster → HOME
position in driver's home cluster → HOME
otherwise → NOT_HOME

The specification describes the final case as an overnight stay in the vehicle, although the data itself only establishes that the rest occurred away from a recognised home location.

Rule F: Short NDI

A short NDI defaults to:

NOT_HOME

However, a short NDI can still be classified as HOME by the earlier rules:

  • vehicle changed, or
  • card removed for more than 80% of the interval.

5. Compact HOME/NOT_HOME decision flow

Did the vehicle change?
 ├─ Yes → HOME
 └─ No
     │
     ├─ Was the card removed for more than 80% of the NDI?
     │   ├─ Yes → HOME
     │   └─ No
     │
     ├─ Is the NDI longer than 24 hours?
     │   ├─ Yes → HOME
     │   └─ No
     │
     ├─ Is the position missing?
     │   ├─ Yes and duration > 7.5 h → HOME
     │   ├─ Yes and duration <= 7.5 h → NOT_HOME
     │   └─ No
     │
     ├─ Is the NDI longer than 7.5 hours?
     │   ├─ No → NOT_HOME
     │   └─ Yes
     │       ├─ Company-home cluster → HOME
     │       ├─ Driver-home cluster → HOME
     │       └─ Other location/noise → NOT_HOME

6. How trip segments are currently determined

The code does not create a trip object. It creates TripSegment records based on country changes.

Each segment contains:

  • driver
  • vehicle
  • start and end time
  • country before and after the boundary
  • start and end positions

The algorithm works as follows:

  1. Take all driving intervals belonging to one driver.
  2. Sort them chronologically.
  3. Start at the beginning of the first driving interval.
  4. Determine the country of the first position.
  5. Scan all GNSS trace points in all driving intervals.
  6. Reverse-geocode every trace point to a country.
  7. When the country changes:
    • close the current segment at that trace-point timestamp;
    • store the old and new country;
    • begin a new segment at the same trace point.
  8. After all trace points, close the final segment at the end of the final driving interval.

Example:

08:00 Driving starts in Austria
10:15 GNSS position changes from Austria to Germany
14:00 GNSS position changes from Germany to France
18:00 Final driving interval ends

Segments produced:

Segment 1: 08:0010:15, Austria → Germany
Segment 2: 10:1514:00, Germany → France
Segment 3: 14:0018:00, France → France

The final segment has the same countryFrom and countryTo because no further border was crossed.

7. What currently determines a “trip”

Strictly speaking, the specification does not determine individual trips.

For each driver, the segment-building function starts at:

first driving interval start

and continues until:

last driving interval end

It splits this complete period only at country changes.

This means that if the input covers an entire month, the algorithm may effectively process the whole month as one continuous sequence of country segments—even when the driver returned home several times.

The HOME and NOT_HOME classifications are not passed into buildTripSegments(). In fact, trip segments are built before the NDIs are classified:

build NDIs
build trip segments
cluster NDIs
determine home locations
classify NDIs

Consequently:

HOME NDI does not end a trip
NOT_HOME NDI does not explicitly continue a trip

The two parts of the algorithm are currently disconnected.

8. Likely intended trip definition

Based on the purpose of the HOME/NOT_HOME classification, the intended definition is most likely:

A trip is a maximal chronological sequence of driving intervals separated only by NOT_HOME NDIs. A HOME NDI closes the current trip and separates it from the next trip.

That would produce the following rules:

Trip start

A trip begins:

  • at the first available driving interval, or
  • at the first driving interval following a HOME NDI.

Trip continuation

The same trip continues across an NDI when:

NDI.status = NOT_HOME

This includes:

  • short breaks,
  • overnight rests away from recognised home locations,
  • rest in the vehicle,
  • long rests at remote locations up to 24 hours.

Trip end

A trip ends at the end of the driving interval preceding an NDI when:

NDI.status = HOME

The next driving interval begins a new trip.

Country segmentation inside a trip

After trips are established, each trip is divided into country segments at:

  • an explicit tachograph border-crossing event, or
  • a reliable country change inferred from GNSS positions.

The logical hierarchy should therefore be:

Driver timeline
  └─ Trip
      ├─ Country segment 1
      ├─ Country segment 2
      └─ Country segment 3

Not:

Driver timeline
  └─ Country segments without trip boundaries

9. Recommended trip-building algorithm

A consistent implementation would be:

1. Build and sort all driving intervals per driver.
2. Build the NDI between every two consecutive driving intervals.
3. Determine location clusters.
4. Classify every NDI as HOME or NOT_HOME.
5. Build trips:
   - start with the first DI;
   - append NOT_HOME NDI and the following DI to the current trip;
   - when a HOME NDI occurs, close the current trip;
   - start a new trip with the next DI.
6. Split every resulting trip at country-border crossings.

Pseudocode:

currentTrip = new Trip(firstDI)

for every NDI between prevDI and nextDI:

    if NDI.status == NOT_HOME:
        currentTrip.add(NDI)
        currentTrip.add(nextDI)

    else:
        currentTrip.end = prevDI.end
        save(currentTrip)

        currentTrip = new Trip(nextDI)

save(currentTrip)

Then:

for every trip:
    trip.segments = splitAtBorderCrossings(trip)

10. Issues and ambiguities in the current rules

Explicit border-crossing events are mentioned but not used

The comment states that a border crossing can come from:

  • an explicit Smart Tachograph v2 event, or
  • a GNSS-derived country change.

However, the implementation scans only gnssTrace. There is no processing of explicit border-crossing events.

Vehicle identity can be incorrect for a segment

A segment may span several driving intervals and possibly several vehicles. Nevertheless, the segment stores only one vehicleId:

  • the vehicle active at the border crossing, or
  • the vehicle of the final DI for the final segment.

If a vehicle changes without a country crossing, the segment can contain activity from multiple vehicles but retain only the last vehicle ID.

HOME does not currently split segments or trips

A driver can:

  1. drive,
  2. return home,
  3. remain home for two days,
  4. begin a new journey,

and the current segment builder can still represent both journeys as one continuous segment if no country changes occur.

Position selection may hide conflicting positions

The NDI position always prefers the previous DIs end position:

previous.posEnd ?? next.posStart

When both positions exist but differ substantially, the inconsistency is ignored.

Long unknown-location intervals are assumed HOME

An NDI longer than 7.5 hours without a position is automatically HOME. This can incorrectly classify an overnight stay abroad as home when GNSS data is missing.

All rests longer than 24 hours are HOME

A driver can remain at a foreign parking place for more than 24 hours, but the rule still returns HOME. This may be intentional as a trip-reset rule, but it is not reliable as a physical-home determination.

Global company-home calculation may be dominated by dataset composition

The company-home denominator includes all qualifying NDIs across all drivers. Results can depend on:

  • the selected time period,
  • drivers with many records,
  • missing GNSS data,
  • incomplete driver histories.

Final interpretation

The document currently provides a valid algorithm for:

  • constructing NDIs between driving intervals,
  • learning frequently visited locations,
  • classifying each NDI as HOME or NOT_HOME,
  • and splitting driving history at detected country changes.

But it does not yet provide a complete trip-building algorithm.

The most consistent interpretation is:

HOME NDI     = boundary between two trips
NOT_HOME NDI = interruption inside the same trip
Border crossing = boundary between segments inside one trip

That relationship needs to be explicitly implemented because it is not present in the current run() or buildTripSegments() logic.