eventhub/README.md

571 lines
14 KiB
Markdown

# EventHub Acquisition Service
Spring Boot + Apache Camel skeleton for acquiring normalized EventHub point events from multiple providers/sources.
The current version focuses on **acquisition from source systems**, especially tachograph DB data. It stores source records as imported. It does **not** merge or deduplicate equivalent events from different providers/sources. It does keep a non-unique `eventSignatureHash` as a future query/projection hint.
## Namespace
```text
at.procon.eventhub
```
## Main model decisions
### One event = one point in time
`EventHubEventDto` has exactly one timestamp:
```text
occurredAt
```
There is no generic `duration`, `endTime`, `validFrom`, or `validTo`. If a source row represents an interval, a mapper may emit separate point events such as `DRIVE START` and `DRIVE END`.
### Tenant is package/job-level
`tenantKey` identifies the customer/data owner. It is mandatory for import packages and tachograph import requests.
### EventSource identifies the technical source
Example:
```json
{
"providerKey": "TACHOGRAPH",
"sourceKind": "VEHICLE_UNIT",
"sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod",
"externalFleetKey": null
}
```
Examples:
```text
TACHOGRAPH / VEHICLE_UNIT
TACHOGRAPH / DRIVER_CARD
YELLOWFOX / TELEMATICS_PLATFORM / YELLOWFOX_D8
FLEETBOARD / TELEMATICS_PLATFORM / FLEETBOARD_POSITION
```
### SourceGroup is package/source grouping only
For tachograph, `sourceGroup` can identify the selected source organisation/root organisation.
```json
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
}
```
For YellowFox, it can identify the provider fleet.
```json
"sourceGroup": {
"type": "FLEET",
"sourceEntityId": "7",
"code": "7",
"name": "YellowFox Fleet 7"
}
```
YellowFox fleet is not forced to be an organisation. It belongs to the same tenant/customer and can later be mapped or resolved through vehicle/driver master data if needed.
### ImportScope describes data selection
`importScope` describes what was selected from the source system.
Full DB import:
```json
"importScope": {
"type": "TENANT_ALL",
"rootSourceOrganisation": null,
"includeChildren": false,
"occurredFrom": null,
"occurredTo": null
}
```
Organisation subtree + time-window import:
```json
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
},
"includeChildren": true,
"occurredFrom": "2026-04-28T00:00:00+02:00",
"occurredTo": "2026-04-29T00:00:00+02:00"
}
```
`occurredFrom` is inclusive. `occurredTo` is exclusive. Both can be `null` for complete DB/history imports.
### Driver/vehicle refs do not contain organisation
Organisation assignment is a **master-data relation**, not an event property.
Events depend on driver and/or vehicle. The relation of organisation to driver/vehicle is imported and resolved separately from master data using `occurredAt`.
Driver ref:
```json
"driverRef": {
"sourceEntityId": "driver-100",
"driverCard": {
"nation": "AT",
"number": "D123456789"
}
}
```
Vehicle ref:
```json
"vehicleRef": {
"sourceEntityId": "vehicle-200",
"vin": "WDB9634031L123456",
"vehicleRegistration": {
"nation": "AT",
"number": "W-12345"
}
}
```
Driver-card-only imports can carry only a nation-scoped VRN and no VIN:
```json
"vehicleRef": {
"sourceEntityId": null,
"vin": null,
"vehicleRegistration": {
"nation": "AT",
"number": "W-12345"
}
}
```
Later master-data resolution can connect `VRN + nation + occurredAt` to a VIN/vehicle.
### No cross-source deduplication during acquisition
The acquisition layer stores every source record independently. It uses `sourceRecordKeyHash` only for idempotency of the same source event:
```text
tenantKey + EventSource + externalSourceEventId
```
It also stores a non-unique `eventSignatureHash`. This is only a semantic hint for future query-time merging/gap filling. It is not unique and must not suppress imports.
## Tachograph import job model
For real tachograph DB extraction, use a tachograph import request. This describes the job and produces an import plan. SQL extraction routes are intentionally scaffolded as the next implementation step.
```http
POST /api/eventhub/acquisition/tachograph/imports/plan
POST /api/eventhub/acquisition/tachograph/imports/start
```
Example: initial import from one root organisation and its children:
```json
{
"tenantKey": "kralowetz",
"eventSource": {
"providerKey": "TACHOGRAPH",
"sourceKind": "MIXED",
"sourceKey": "TACHOGRAPH_DB",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod"
},
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
},
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
},
"includeChildren": true,
"occurredFrom": "2025-01-01T00:00:00+01:00",
"occurredTo": null
},
"eventFamilies": [
"DRIVER_ACTIVITY",
"DRIVER_CARD",
"POSITION",
"BORDER_CROSSING",
"LOAD_UNLOAD",
"PLACE",
"SPECIFIC_CONDITION",
"SPEEDING"
],
"mode": "INITIAL_BACKFILL",
"refreshMasterDataFirst": true,
"acquisitionStrategy": "OCCURRED_AT_WINDOW_WITH_OVERLAP"
}
```
Example: regular incremental update:
```json
{
"tenantKey": "kralowetz",
"eventSource": {
"providerKey": "TACHOGRAPH",
"sourceKind": "MIXED",
"sourceKey": "TACHOGRAPH_DB",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod"
},
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"includeChildren": true,
"occurredFrom": null,
"occurredTo": null
},
"eventFamilies": ["DRIVER_ACTIVITY", "DRIVER_CARD", "POSITION", "BORDER_CROSSING", "LOAD_UNLOAD", "PLACE", "SPECIFIC_CONDITION", "SPEEDING"],
"mode": "INCREMENTAL_UPDATE",
"refreshMasterDataFirst": true,
"acquisitionStrategy": "SOURCE_PACKAGE_WATERMARK"
}
```
## Tachograph extraction plan
The import-plan service currently creates extraction definitions like:
```text
DRIVER_ACTIVITY / VEHICLE_UNIT -> VUActivity
DRIVER_ACTIVITY / DRIVER_CARD -> CardActivity
DRIVER_CARD / VEHICLE_UNIT -> IWCycle
DRIVER_CARD / DRIVER_CARD -> CardVehiclesUsed
POSITION / VEHICLE_UNIT -> VUPlaces, VULoadUnload, VUGnssAccumulatedDriving, VUBorderCrossing
POSITION / DRIVER_CARD -> CardPlaces, CardLoadUnload, CardGnssAccumulatedDriving, CardBorderCrossing
BORDER_CROSSING / VEHICLE_UNIT -> VUBorderCrossing
BORDER_CROSSING / DRIVER_CARD -> CardBorderCrossing
LOAD_UNLOAD / VEHICLE_UNIT -> VULoadUnload
LOAD_UNLOAD / DRIVER_CARD -> CardLoadUnload
SPECIFIC_CONDITION / VEHICLE_UNIT -> VUSpecificCondition
SPECIFIC_CONDITION / DRIVER_CARD -> CardSpecificCondition
PLACE / VEHICLE_UNIT -> VUPlaces
PLACE / DRIVER_CARD -> CardPlaces
SPEEDING / VEHICLE_UNIT -> SpeedingEvents
```
The next implementation step is to replace the scaffolded plan items with actual Camel/JDBC SQL extraction routes.
## Acquisition alternatives considered
### Alternative A: occurred-time window import
Read events by `occurredAt` for a root organisation/time window.
Pros:
```text
simple
works for initial backfill
matches explicit from/to import requests
```
Cons:
```text
unsafe as the only incremental method because a newly imported card/VU package can contain old occurredAt data
requires overlap windows for regular updates
```
Best use:
```text
initial backfill and reprocessing
fallback incremental strategy with overlap
```
### Alternative B: source-package watermark import
Read original tachograph card/VU packages that were imported/changed in the tachograph DB since the last successful EventHub run, then extract all events belonging to those packages.
Pros:
```text
best for regular updates
handles late-arriving historical tachograph packages
fits the tachograph package concept
```
Cons:
```text
requires reliable source package metadata and links from event rows to package/source download
more complex SQL and cursor state
```
Best use:
```text
primary incremental strategy if tachograph DB exposes package import timestamps/ids
```
### Alternative C: source-row watermark import
Read source event rows changed since last run using row-level `updatedAt` or monotonic IDs.
Pros:
```text
precise if row update timestamps are reliable
does not require package-level model
```
Cons:
```text
not possible if source tables do not have reliable changed/updated metadata
harder across many event tables
```
Best use:
```text
fallback when rows have reliable updatedAt/row version fields
```
### Alternative D: per vehicle/per driver polling
After master-data refresh, loop through vehicles and drivers in the selected organisation subtree and read their event data.
Pros:
```text
matches your existing data acquisition pattern
naturally separates vehicle-unit and driver-card data
supports organisation-scoped imports well
```
Cons:
```text
can be slower for large fleets
requires careful batching/chunking and parallelism
can miss late old data unless combined with package/row watermark or overlap
```
Best use:
```text
scope resolution and controlled extraction, combined with Alternative A or B
```
## Recommended ingestion strategy
Use a hybrid:
```text
Initial import:
master data first
organisation subtree + occurredFrom/occurredTo
chunk by time and/or vehicle/driver
import idempotently by sourceRecordKeyHash
Regular update:
master data first
prefer source-package watermark
fallback to occurredAt overlap window if package metadata is insufficient
import idempotently by sourceRecordKeyHash
```
This means the EventHub acquisition package is an **extraction package**, while the original tachograph card/VU package should be preserved as source metadata in payload or later in a dedicated source-package table.
## Existing package-level normalized event ingestion
```http
POST /api/eventhub/acquisition/packages
```
Example:
```json
{
"package": {
"tenantKey": "kralowetz",
"eventSource": {
"providerKey": "TACHOGRAPH",
"sourceKind": "VEHICLE_UNIT",
"sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod"
},
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"includeChildren": true,
"occurredFrom": "2026-04-28T00:00:00+02:00",
"occurredTo": "2026-04-29T00:00:00+02:00"
},
"eventFamily": "DRIVER_ACTIVITY",
"businessDate": "2026-04-28",
"externalPackageId": "TACHOGRAPH:ORG-147-SUBTREE:DRIVER_ACTIVITY:2026-04-28"
},
"events": [
{
"externalSourceEventId": "TACHOGRAPH:VEHICLE_UNIT:activity:456:start",
"driverRef": {
"sourceEntityId": "driver-100",
"driverCard": {
"nation": "AT",
"number": "D123456789"
}
},
"vehicleRef": {
"sourceEntityId": "vehicle-200",
"vin": "WDB9634031L123456",
"vehicleRegistration": {
"nation": "AT",
"number": "W-12345"
}
},
"occurredAt": "2026-04-28T08:00:00+02:00",
"eventDomain": "DRIVER_ACTIVITY",
"eventType": "DRIVE",
"lifecycle": "START",
"eventDetails": {
"type": "DRIVER_ACTIVITY",
"attributes": {
"cardSlot": "DRIVER",
"cardStatus": "INSERTED",
"drivingStatus": "SINGLE"
}
},
"payload": {
"raw": {
"activity": 3,
"cardSlot": 0,
"cardStatus": 0,
"drivingStatus": 0
}
}
}
]
}
```
## Routes
```text
direct:yellowfox-d8-booking-input
direct:telematics-position-input
direct:tachograph-activity-input
direct:tachograph-import-start
direct:eventhub-package-input
direct:eventhub-manual-input
```
Common route:
```text
direct:eventhub-normalized-input
-> validate EventHubEventDto
-> create package key from tenant + EventSource + sourceGroup + importScope + eventFamily
-> seda:eventhub-batch-input
-> aggregate by eventhub.packageKey
-> sort by occurredAt inside the batch
-> EventHubIngestionService.ingest(...)
```
## Start PostgreSQL
```bash
docker compose up -d
```
## Run the service
```bash
mvn spring-boot:run
```
## Check acquisition packages
```sql
select p.received_at,
p.tenant_key,
s.provider_key,
s.source_kind,
s.source_key,
p.source_group_type,
p.source_group_entity_id,
p.import_scope_type,
p.root_source_org_entity_id,
p.occurred_from,
p.occurred_to,
p.event_family,
p.business_date,
p.status,
p.event_count
from eventhub.data_package p
join eventhub.event_source s on s.id = p.event_source_id
order by p.received_at desc;
```
## Check acquired events
```sql
select occurred_at,
driver_source_entity_id,
driver_card_nation,
driver_card_number,
vehicle_source_entity_id,
vehicle_vin,
vehicle_registration_nation,
vehicle_registration_number,
event_domain,
event_type,
lifecycle,
event_signature_hash,
event_details,
payload
from eventhub.acquired_event
order by occurred_at desc;
```
## Next implementation steps
1. Add actual Camel/JDBC extraction routes behind the tachograph import plan.
2. Implement master-data acquisition first: organisation tree, driver/card assignments, vehicle VIN/VRN assignments, driver/vehicle organisation assignment histories.
3. Implement initial backfill using organisation/time scope.
4. Implement incremental import using source-package watermark, with occurredAt overlap fallback.
5. Discuss query/read models later: source priority and gap filling across tachograph, YellowFox and other sources.