eventhub/docs/ndi_home_classification_ana...

559 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 1. Terminology and source data
The algorithm uses data from:
- `M_`: vehicle-unit tachograph data, especially GNSS positions.
- `C_`: driver-card data, especially activities and card insertion/removal events.
The main objects are:
## DI — Driving Interval
A continuous interval in which the driver is driving.
It contains:
- driver
- vehicle
- start and end time
- start and end positions
- optional GNSS trace points
## NDI — Non-Driving Interval
The gap between two consecutive driving intervals.
It contains:
- driver
- vehicle before the interval
- vehicle after the interval
- start and end time
- inferred position
- card-removal interval
- location cluster
- classification: `HOME` or `NOT_HOME`
In this specification, `HOME` does not necessarily mean the drivers private residence. It means that the interruption is treated as a home/base interruption. It can also represent:
- a company depot,
- a vehicle change,
- a long card-removal period,
- or simply a rest longer than 24 hours.
# 2. How an NDI is built
Driving intervals are grouped by driver and ordered chronologically.
For every pair of consecutive driving intervals:
```text
previous driving interval
non-driving interval
next driving interval
```
The NDI is created as follows:
```text
NDI.start = previous DI.end
NDI.end = next DI.start
```
The vehicles are:
```text
NDI.vehicleStart = previous DI.vehicleId
NDI.vehicleEnd = next DI.vehicleId
```
The position is selected using:
```text
previous DI end position
otherwise
next DI start position
otherwise
no position
```
So the exact position rule is:
```text
NDI.pos = previous.posEnd ?? next.posStart
```
The driver-card events between the two driving intervals are used to find the card-removal interval.
## Important limitations
The algorithm creates NDIs only between consecutive driving intervals. It does not create:
- an NDI before the first driving interval,
- an NDI after the final driving interval,
- or an NDI when only card/activity data exists without surrounding driving intervals.
# 3. How home locations are learned
Only NDIs satisfying both conditions are used for location learning:
```text
duration > 7.5 hours
position is known
```
Exactly 7.5 hours does not qualify because the comparison is strictly `>`.
## 3.1 Location clustering
The positions are clustered globally with DBSCAN:
| Parameter | Value |
|---|---:|
| Maximum cluster distance | 150 metres |
| Minimum points | 3 |
| Distance calculation | Haversine/PostGIS |
| Unclustered points | `NOISE` |
All drivers qualifying NDIs appear to be clustered together in one global clustering operation.
## 3.2 Company-home locations
A cluster becomes a company-home location, normally interpreted as a depot, when:
```text
visits to cluster / all long positioned NDIs > 25%
```
The denominator is all NDIs in the complete dataset that:
- are longer than 7.5 hours, and
- have a position.
Noise points are excluded as possible company-home clusters.
## 3.3 Driver-home locations
For each driver, a cluster becomes a private driver-home location when:
```text
driver's visits to cluster /
driver's long positioned NDIs > 25%
```
Additionally:
```text
cluster must not already be a company-home cluster
```
Therefore, a location used frequently by the whole company is classified as a company depot rather than a private driver home.
## Exact threshold behaviour
The comparisons are strict:
- exactly 25% is not enough;
- the share must be greater than 25%.
# 4. Rules for determining whether an NDI is HOME or NOT_HOME
The rules are evaluated in a fixed priority order. The first matching rule wins.
## Decision table
| Priority | Condition | Result |
|---:|---|---|
| A | Vehicle before NDI differs from vehicle after NDI | `HOME` |
| B | Card is removed for more than 80% of the NDI | `HOME` |
| C | NDI duration is more than 24 hours | `HOME` |
| D1 | Position unknown and duration more than 7.5 hours | `HOME` |
| D2 | Position unknown and duration no more than 7.5 hours | `NOT_HOME` |
| E1 | Position known, duration more than 7.5 hours, and position belongs to company-home or driver-home cluster | `HOME` |
| E2 | Position known, duration more than 7.5 hours, but position is not a recognised home cluster | `NOT_HOME` |
| F | All remaining short NDIs | `NOT_HOME` |
## Rule A: Change of vehicle
```text
vehicleStart != vehicleEnd → HOME
```
When the next driving interval starts in another vehicle, the NDI is always treated as `HOME`.
This rule has the highest priority. It applies even when:
- the NDI is short,
- the position is known to be away from home,
- or the driver card remained inserted for much of the interval.
This is therefore a technical trip-separation rule rather than proof that the driver was physically at home.
## Rule B: Card removed for more than 80%
```text
cardOut duration > 80% of NDI duration → HOME
```
Exactly 80% is not sufficient.
Example:
```text
NDI duration: 10 hours
Card removed: 8 hours
Result: NOT enough for Rule B
Card removed: 8 hours 1 minute
Result: HOME
```
The data model contains only one `cardOut` interval. It is not defined how several card-removal periods inside one NDI should be combined.
## Rule C: NDI longer than 24 hours
```text
NDI duration > 24 hours → HOME
```
Exactly 24 hours is not sufficient.
This rule overrides the position logic. Even when the driver remains at an unrecognised remote location, an NDI longer than 24 hours is classified as `HOME`.
## Rule D: No known position
When the position cannot be determined:
```text
duration > 7.5 hours → HOME
duration <= 7.5 hours → NOT_HOME
```
This is a fallback assumption. A long rest without location evidence is assumed to be home.
## Rule E: Long NDI with a known position
For an NDI longer than 7.5 hours:
```text
position in company-home cluster → HOME
position in driver's home cluster → HOME
otherwise → NOT_HOME
```
The specification describes the final case as an overnight stay in the vehicle, although the data itself only establishes that the rest occurred away from a recognised home location.
## Rule F: Short NDI
A short NDI defaults to:
```text
NOT_HOME
```
However, a short NDI can still be classified as `HOME` by the earlier rules:
- vehicle changed, or
- card removed for more than 80% of the interval.
# 5. Compact HOME/NOT_HOME decision flow
```text
Did the vehicle change?
├─ Yes → HOME
└─ No
├─ Was the card removed for more than 80% of the NDI?
│ ├─ Yes → HOME
│ └─ No
├─ Is the NDI longer than 24 hours?
│ ├─ Yes → HOME
│ └─ No
├─ Is the position missing?
│ ├─ Yes and duration > 7.5 h → HOME
│ ├─ Yes and duration <= 7.5 h → NOT_HOME
│ └─ No
├─ Is the NDI longer than 7.5 hours?
│ ├─ No → NOT_HOME
│ └─ Yes
│ ├─ Company-home cluster → HOME
│ ├─ Driver-home cluster → HOME
│ └─ Other location/noise → NOT_HOME
```
# 6. How trip segments are currently determined
The code does not create a trip object. It creates `TripSegment` records based on country changes.
Each segment contains:
- driver
- vehicle
- start and end time
- country before and after the boundary
- start and end positions
The algorithm works as follows:
1. Take all driving intervals belonging to one driver.
2. Sort them chronologically.
3. Start at the beginning of the first driving interval.
4. Determine the country of the first position.
5. Scan all GNSS trace points in all driving intervals.
6. Reverse-geocode every trace point to a country.
7. When the country changes:
- close the current segment at that trace-point timestamp;
- store the old and new country;
- begin a new segment at the same trace point.
8. After all trace points, close the final segment at the end of the final driving interval.
Example:
```text
08:00 Driving starts in Austria
10:15 GNSS position changes from Austria to Germany
14:00 GNSS position changes from Germany to France
18:00 Final driving interval ends
```
Segments produced:
```text
Segment 1: 08:0010:15, Austria → Germany
Segment 2: 10:1514:00, Germany → France
Segment 3: 14:0018:00, France → France
```
The final segment has the same `countryFrom` and `countryTo` because no further border was crossed.
# 7. What currently determines a “trip”
Strictly speaking, the specification does not determine individual trips.
For each driver, the segment-building function starts at:
```text
first driving interval start
```
and continues until:
```text
last driving interval end
```
It splits this complete period only at country changes.
This means that if the input covers an entire month, the algorithm may effectively process the whole month as one continuous sequence of country segments—even when the driver returned home several times.
The `HOME` and `NOT_HOME` classifications are not passed into `buildTripSegments()`. In fact, trip segments are built before the NDIs are classified:
```text
build NDIs
build trip segments
cluster NDIs
determine home locations
classify NDIs
```
Consequently:
```text
HOME NDI does not end a trip
NOT_HOME NDI does not explicitly continue a trip
```
The two parts of the algorithm are currently disconnected.
# 8. Likely intended trip definition
Based on the purpose of the HOME/NOT_HOME classification, the intended definition is most likely:
> A trip is a maximal chronological sequence of driving intervals separated only by `NOT_HOME` NDIs. A `HOME` NDI closes the current trip and separates it from the next trip.
That would produce the following rules:
## Trip start
A trip begins:
- at the first available driving interval, or
- at the first driving interval following a `HOME` NDI.
## Trip continuation
The same trip continues across an NDI when:
```text
NDI.status = NOT_HOME
```
This includes:
- short breaks,
- overnight rests away from recognised home locations,
- rest in the vehicle,
- long rests at remote locations up to 24 hours.
## Trip end
A trip ends at the end of the driving interval preceding an NDI when:
```text
NDI.status = HOME
```
The next driving interval begins a new trip.
## Country segmentation inside a trip
After trips are established, each trip is divided into country segments at:
- an explicit tachograph border-crossing event, or
- a reliable country change inferred from GNSS positions.
The logical hierarchy should therefore be:
```text
Driver timeline
└─ Trip
├─ Country segment 1
├─ Country segment 2
└─ Country segment 3
```
Not:
```text
Driver timeline
└─ Country segments without trip boundaries
```
# 9. Recommended trip-building algorithm
A consistent implementation would be:
```text
1. Build and sort all driving intervals per driver.
2. Build the NDI between every two consecutive driving intervals.
3. Determine location clusters.
4. Classify every NDI as HOME or NOT_HOME.
5. Build trips:
- start with the first DI;
- append NOT_HOME NDI and the following DI to the current trip;
- when a HOME NDI occurs, close the current trip;
- start a new trip with the next DI.
6. Split every resulting trip at country-border crossings.
```
Pseudocode:
```text
currentTrip = new Trip(firstDI)
for every NDI between prevDI and nextDI:
if NDI.status == NOT_HOME:
currentTrip.add(NDI)
currentTrip.add(nextDI)
else:
currentTrip.end = prevDI.end
save(currentTrip)
currentTrip = new Trip(nextDI)
save(currentTrip)
```
Then:
```text
for every trip:
trip.segments = splitAtBorderCrossings(trip)
```
# 10. Issues and ambiguities in the current rules
## Explicit border-crossing events are mentioned but not used
The comment states that a border crossing can come from:
- an explicit Smart Tachograph v2 event, or
- a GNSS-derived country change.
However, the implementation scans only `gnssTrace`. There is no processing of explicit border-crossing events.
## Vehicle identity can be incorrect for a segment
A segment may span several driving intervals and possibly several vehicles. Nevertheless, the segment stores only one `vehicleId`:
- the vehicle active at the border crossing, or
- the vehicle of the final DI for the final segment.
If a vehicle changes without a country crossing, the segment can contain activity from multiple vehicles but retain only the last vehicle ID.
## HOME does not currently split segments or trips
A driver can:
1. drive,
2. return home,
3. remain home for two days,
4. begin a new journey,
and the current segment builder can still represent both journeys as one continuous segment if no country changes occur.
## Position selection may hide conflicting positions
The NDI position always prefers the previous DIs end position:
```text
previous.posEnd ?? next.posStart
```
When both positions exist but differ substantially, the inconsistency is ignored.
## Long unknown-location intervals are assumed HOME
An NDI longer than 7.5 hours without a position is automatically `HOME`. This can incorrectly classify an overnight stay abroad as home when GNSS data is missing.
## All rests longer than 24 hours are HOME
A driver can remain at a foreign parking place for more than 24 hours, but the rule still returns `HOME`. This may be intentional as a trip-reset rule, but it is not reliable as a physical-home determination.
## Global company-home calculation may be dominated by dataset composition
The company-home denominator includes all qualifying NDIs across all drivers. Results can depend on:
- the selected time period,
- drivers with many records,
- missing GNSS data,
- incomplete driver histories.
# Final interpretation
The document currently provides a valid algorithm for:
- constructing NDIs between driving intervals,
- learning frequently visited locations,
- classifying each NDI as `HOME` or `NOT_HOME`,
- and splitting driving history at detected country changes.
But it does **not yet provide a complete trip-building algorithm**.
The most consistent interpretation is:
```text
HOME NDI = boundary between two trips
NOT_HOME NDI = interruption inside the same trip
Border crossing = boundary between segments inside one trip
```
That relationship needs to be explicitly implemented because it is not present in the current `run()` or `buildTripSegments()` logic.