14 KiB
EventHub Acquisition Service
Spring Boot + Apache Camel skeleton for acquiring normalized EventHub point events from multiple providers/sources.
The current version focuses on acquisition from source systems, especially tachograph DB data. It stores source records as imported. It does not merge or deduplicate equivalent events from different providers/sources. It does keep a non-unique eventSignatureHash as a future query/projection hint.
Namespace
at.procon.eventhub
Main model decisions
One event = one point in time
EventHubEventDto has exactly one timestamp:
occurredAt
There is no generic duration, endTime, validFrom, or validTo. If a source row represents an interval, a mapper may emit separate point events such as DRIVE START and DRIVE END.
Tenant is package/job-level
tenantKey identifies the customer/data owner. It is mandatory for import packages and tachograph import requests.
EventSource identifies the technical source
Example:
{
"providerKey": "TACHOGRAPH",
"sourceKind": "VEHICLE_UNIT",
"sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod",
"externalFleetKey": null
}
Examples:
TACHOGRAPH / VEHICLE_UNIT
TACHOGRAPH / DRIVER_CARD
YELLOWFOX / TELEMATICS_PLATFORM / YELLOWFOX_D8
FLEETBOARD / TELEMATICS_PLATFORM / FLEETBOARD_POSITION
SourceGroup is package/source grouping only
For tachograph, sourceGroup can identify the selected source organisation/root organisation.
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
}
For YellowFox, it can identify the provider fleet.
"sourceGroup": {
"type": "FLEET",
"sourceEntityId": "7",
"code": "7",
"name": "YellowFox Fleet 7"
}
YellowFox fleet is not forced to be an organisation. It belongs to the same tenant/customer and can later be mapped or resolved through vehicle/driver master data if needed.
ImportScope describes data selection
importScope describes what was selected from the source system.
Full DB import:
"importScope": {
"type": "TENANT_ALL",
"rootSourceOrganisation": null,
"includeChildren": false,
"occurredFrom": null,
"occurredTo": null
}
Organisation subtree + time-window import:
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
},
"includeChildren": true,
"occurredFrom": "2026-04-28T00:00:00+02:00",
"occurredTo": "2026-04-29T00:00:00+02:00"
}
occurredFrom is inclusive. occurredTo is exclusive. Both can be null for complete DB/history imports.
Driver/vehicle refs do not contain organisation
Organisation assignment is a master-data relation, not an event property.
Events depend on driver and/or vehicle. The relation of organisation to driver/vehicle is imported and resolved separately from master data using occurredAt.
Driver ref:
"driverRef": {
"sourceEntityId": "driver-100",
"driverCard": {
"nation": "AT",
"number": "D123456789"
}
}
Vehicle ref:
"vehicleRef": {
"sourceEntityId": "vehicle-200",
"vin": "WDB9634031L123456",
"vehicleRegistration": {
"nation": "AT",
"number": "W-12345"
}
}
Driver-card-only imports can carry only a nation-scoped VRN and no VIN:
"vehicleRef": {
"sourceEntityId": null,
"vin": null,
"vehicleRegistration": {
"nation": "AT",
"number": "W-12345"
}
}
Later master-data resolution can connect VRN + nation + occurredAt to a VIN/vehicle.
No cross-source deduplication during acquisition
The acquisition layer stores every source record independently. It uses sourceRecordKeyHash only for idempotency of the same source event:
tenantKey + EventSource + externalSourceEventId
It also stores a non-unique eventSignatureHash. This is only a semantic hint for future query-time merging/gap filling. It is not unique and must not suppress imports.
Tachograph import job model
For real tachograph DB extraction, use a tachograph import request. This describes the job and produces an import plan. SQL extraction routes are intentionally scaffolded as the next implementation step.
POST /api/eventhub/acquisition/tachograph/imports/plan
POST /api/eventhub/acquisition/tachograph/imports/start
Example: initial import from one root organisation and its children:
{
"tenantKey": "kralowetz",
"eventSource": {
"providerKey": "TACHOGRAPH",
"sourceKind": "MIXED",
"sourceKey": "TACHOGRAPH_DB",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod"
},
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
},
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147",
"code": "147",
"name": "Kralowetz"
},
"includeChildren": true,
"occurredFrom": "2025-01-01T00:00:00+01:00",
"occurredTo": null
},
"eventFamilies": [
"DRIVER_ACTIVITY",
"DRIVER_CARD",
"POSITION",
"BORDER_CROSSING",
"LOAD_UNLOAD",
"PLACE",
"SPECIFIC_CONDITION",
"SPEEDING"
],
"mode": "INITIAL_BACKFILL",
"refreshMasterDataFirst": true,
"acquisitionStrategy": "OCCURRED_AT_WINDOW_WITH_OVERLAP"
}
Example: regular incremental update:
{
"tenantKey": "kralowetz",
"eventSource": {
"providerKey": "TACHOGRAPH",
"sourceKind": "MIXED",
"sourceKey": "TACHOGRAPH_DB",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod"
},
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"includeChildren": true,
"occurredFrom": null,
"occurredTo": null
},
"eventFamilies": ["DRIVER_ACTIVITY", "DRIVER_CARD", "POSITION", "BORDER_CROSSING", "LOAD_UNLOAD", "PLACE", "SPECIFIC_CONDITION", "SPEEDING"],
"mode": "INCREMENTAL_UPDATE",
"refreshMasterDataFirst": true,
"acquisitionStrategy": "SOURCE_PACKAGE_WATERMARK"
}
Tachograph extraction plan
The import-plan service currently creates extraction definitions like:
DRIVER_ACTIVITY / VEHICLE_UNIT -> VUActivity
DRIVER_ACTIVITY / DRIVER_CARD -> CardActivity
DRIVER_CARD / VEHICLE_UNIT -> IWCycle
DRIVER_CARD / DRIVER_CARD -> CardVehiclesUsed
POSITION / VEHICLE_UNIT -> VUPlaces, VULoadUnload, VUGnssAccumulatedDriving, VUBorderCrossing
POSITION / DRIVER_CARD -> CardPlaces, CardLoadUnload, CardGnssAccumulatedDriving, CardBorderCrossing
BORDER_CROSSING / VEHICLE_UNIT -> VUBorderCrossing
BORDER_CROSSING / DRIVER_CARD -> CardBorderCrossing
LOAD_UNLOAD / VEHICLE_UNIT -> VULoadUnload
LOAD_UNLOAD / DRIVER_CARD -> CardLoadUnload
SPECIFIC_CONDITION / VEHICLE_UNIT -> VUSpecificCondition
SPECIFIC_CONDITION / DRIVER_CARD -> CardSpecificCondition
PLACE / VEHICLE_UNIT -> VUPlaces
PLACE / DRIVER_CARD -> CardPlaces
SPEEDING / VEHICLE_UNIT -> SpeedingEvents
The next implementation step is to replace the scaffolded plan items with actual Camel/JDBC SQL extraction routes.
Acquisition alternatives considered
Alternative A: occurred-time window import
Read events by occurredAt for a root organisation/time window.
Pros:
simple
works for initial backfill
matches explicit from/to import requests
Cons:
unsafe as the only incremental method because a newly imported card/VU package can contain old occurredAt data
requires overlap windows for regular updates
Best use:
initial backfill and reprocessing
fallback incremental strategy with overlap
Alternative B: source-package watermark import
Read original tachograph card/VU packages that were imported/changed in the tachograph DB since the last successful EventHub run, then extract all events belonging to those packages.
Pros:
best for regular updates
handles late-arriving historical tachograph packages
fits the tachograph package concept
Cons:
requires reliable source package metadata and links from event rows to package/source download
more complex SQL and cursor state
Best use:
primary incremental strategy if tachograph DB exposes package import timestamps/ids
Alternative C: source-row watermark import
Read source event rows changed since last run using row-level updatedAt or monotonic IDs.
Pros:
precise if row update timestamps are reliable
does not require package-level model
Cons:
not possible if source tables do not have reliable changed/updated metadata
harder across many event tables
Best use:
fallback when rows have reliable updatedAt/row version fields
Alternative D: per vehicle/per driver polling
After master-data refresh, loop through vehicles and drivers in the selected organisation subtree and read their event data.
Pros:
matches your existing data acquisition pattern
naturally separates vehicle-unit and driver-card data
supports organisation-scoped imports well
Cons:
can be slower for large fleets
requires careful batching/chunking and parallelism
can miss late old data unless combined with package/row watermark or overlap
Best use:
scope resolution and controlled extraction, combined with Alternative A or B
Recommended ingestion strategy
Use a hybrid:
Initial import:
master data first
organisation subtree + occurredFrom/occurredTo
chunk by time and/or vehicle/driver
import idempotently by sourceRecordKeyHash
Regular update:
master data first
prefer source-package watermark
fallback to occurredAt overlap window if package metadata is insufficient
import idempotently by sourceRecordKeyHash
This means the EventHub acquisition package is an extraction package, while the original tachograph card/VU package should be preserved as source metadata in payload or later in a dedicated source-package table.
Existing package-level normalized event ingestion
POST /api/eventhub/acquisition/packages
Example:
{
"package": {
"tenantKey": "kralowetz",
"eventSource": {
"providerKey": "TACHOGRAPH",
"sourceKind": "VEHICLE_UNIT",
"sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
"sourceInstanceKey": "main-tachograph-db",
"tenantProviderSettingKey": "kralowetz-tachograph-prod"
},
"sourceGroup": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"importScope": {
"type": "SOURCE_ORGANISATION_SUBTREE",
"rootSourceOrganisation": {
"type": "ORGANISATION",
"sourceEntityId": "147"
},
"includeChildren": true,
"occurredFrom": "2026-04-28T00:00:00+02:00",
"occurredTo": "2026-04-29T00:00:00+02:00"
},
"eventFamily": "DRIVER_ACTIVITY",
"businessDate": "2026-04-28",
"externalPackageId": "TACHOGRAPH:ORG-147-SUBTREE:DRIVER_ACTIVITY:2026-04-28"
},
"events": [
{
"externalSourceEventId": "TACHOGRAPH:VEHICLE_UNIT:activity:456:start",
"driverRef": {
"sourceEntityId": "driver-100",
"driverCard": {
"nation": "AT",
"number": "D123456789"
}
},
"vehicleRef": {
"sourceEntityId": "vehicle-200",
"vin": "WDB9634031L123456",
"vehicleRegistration": {
"nation": "AT",
"number": "W-12345"
}
},
"occurredAt": "2026-04-28T08:00:00+02:00",
"eventDomain": "DRIVER_ACTIVITY",
"eventType": "DRIVE",
"lifecycle": "START",
"eventDetails": {
"type": "DRIVER_ACTIVITY",
"attributes": {
"cardSlot": "DRIVER",
"cardStatus": "INSERTED",
"drivingStatus": "SINGLE"
}
},
"payload": {
"raw": {
"activity": 3,
"cardSlot": 0,
"cardStatus": 0,
"drivingStatus": 0
}
}
}
]
}
Routes
direct:yellowfox-d8-booking-input
direct:telematics-position-input
direct:tachograph-activity-input
direct:tachograph-import-start
direct:eventhub-package-input
direct:eventhub-manual-input
Common route:
direct:eventhub-normalized-input
-> validate EventHubEventDto
-> create package key from tenant + EventSource + sourceGroup + importScope + eventFamily
-> seda:eventhub-batch-input
-> aggregate by eventhub.packageKey
-> sort by occurredAt inside the batch
-> EventHubIngestionService.ingest(...)
Start PostgreSQL
docker compose up -d
Run the service
mvn spring-boot:run
Check acquisition packages
select p.received_at,
p.tenant_key,
s.provider_key,
s.source_kind,
s.source_key,
p.source_group_type,
p.source_group_entity_id,
p.import_scope_type,
p.root_source_org_entity_id,
p.occurred_from,
p.occurred_to,
p.event_family,
p.business_date,
p.status,
p.event_count
from eventhub.data_package p
join eventhub.event_source s on s.id = p.event_source_id
order by p.received_at desc;
Check acquired events
select occurred_at,
driver_source_entity_id,
driver_card_nation,
driver_card_number,
vehicle_source_entity_id,
vehicle_vin,
vehicle_registration_nation,
vehicle_registration_number,
event_domain,
event_type,
lifecycle,
event_signature_hash,
event_details,
payload
from eventhub.acquired_event
order by occurred_at desc;
Next implementation steps
- Add actual Camel/JDBC extraction routes behind the tachograph import plan.
- Implement master-data acquisition first: organisation tree, driver/card assignments, vehicle VIN/VRN assignments, driver/vehicle organisation assignment histories.
- Implement initial backfill using organisation/time scope.
- Implement incremental import using source-package watermark, with occurredAt overlap fallback.
- Discuss query/read models later: source priority and gap filling across tachograph, YellowFox and other sources.