eventhub/README.md

14 KiB

EventHub Acquisition Service

Spring Boot + Apache Camel skeleton for acquiring normalized EventHub point events from multiple providers/sources.

The current version focuses on acquisition from source systems, especially tachograph DB data. It stores source records as imported. It does not merge or deduplicate equivalent events from different providers/sources. It does keep a non-unique eventSignatureHash as a future query/projection hint.

Namespace

at.procon.eventhub

Main model decisions

One event = one point in time

EventHubEventDto has exactly one timestamp:

occurredAt

There is no generic duration, endTime, validFrom, or validTo. If a source row represents an interval, a mapper may emit separate point events such as DRIVE START and DRIVE END.

Tenant is package/job-level

tenantKey identifies the customer/data owner. It is mandatory for import packages and tachograph import requests.

EventSource identifies the technical source

Example:

{
  "providerKey": "TACHOGRAPH",
  "sourceKind": "VEHICLE_UNIT",
  "sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
  "sourceInstanceKey": "main-tachograph-db",
  "tenantProviderSettingKey": "kralowetz-tachograph-prod",
  "externalFleetKey": null
}

Examples:

TACHOGRAPH / VEHICLE_UNIT
TACHOGRAPH / DRIVER_CARD
YELLOWFOX / TELEMATICS_PLATFORM / YELLOWFOX_D8
FLEETBOARD / TELEMATICS_PLATFORM / FLEETBOARD_POSITION

SourceGroup is package/source grouping only

For tachograph, sourceGroup can identify the selected source organisation/root organisation.

"sourceGroup": {
  "type": "ORGANISATION",
  "sourceEntityId": "147",
  "code": "147",
  "name": "Kralowetz"
}

For YellowFox, it can identify the provider fleet.

"sourceGroup": {
  "type": "FLEET",
  "sourceEntityId": "7",
  "code": "7",
  "name": "YellowFox Fleet 7"
}

YellowFox fleet is not forced to be an organisation. It belongs to the same tenant/customer and can later be mapped or resolved through vehicle/driver master data if needed.

ImportScope describes data selection

importScope describes what was selected from the source system.

Full DB import:

"importScope": {
  "type": "TENANT_ALL",
  "rootSourceOrganisation": null,
  "includeChildren": false,
  "occurredFrom": null,
  "occurredTo": null
}

Organisation subtree + time-window import:

"importScope": {
  "type": "SOURCE_ORGANISATION_SUBTREE",
  "rootSourceOrganisation": {
    "type": "ORGANISATION",
    "sourceEntityId": "147",
    "code": "147",
    "name": "Kralowetz"
  },
  "includeChildren": true,
  "occurredFrom": "2026-04-28T00:00:00+02:00",
  "occurredTo": "2026-04-29T00:00:00+02:00"
}

occurredFrom is inclusive. occurredTo is exclusive. Both can be null for complete DB/history imports.

Driver/vehicle refs do not contain organisation

Organisation assignment is a master-data relation, not an event property.

Events depend on driver and/or vehicle. The relation of organisation to driver/vehicle is imported and resolved separately from master data using occurredAt.

Driver ref:

"driverRef": {
  "sourceEntityId": "driver-100",
  "driverCard": {
    "nation": "AT",
    "number": "D123456789"
  }
}

Vehicle ref:

"vehicleRef": {
  "sourceEntityId": "vehicle-200",
  "vin": "WDB9634031L123456",
  "vehicleRegistration": {
    "nation": "AT",
    "number": "W-12345"
  }
}

Driver-card-only imports can carry only a nation-scoped VRN and no VIN:

"vehicleRef": {
  "sourceEntityId": null,
  "vin": null,
  "vehicleRegistration": {
    "nation": "AT",
    "number": "W-12345"
  }
}

Later master-data resolution can connect VRN + nation + occurredAt to a VIN/vehicle.

No cross-source deduplication during acquisition

The acquisition layer stores every source record independently. It uses sourceRecordKeyHash only for idempotency of the same source event:

tenantKey + EventSource + externalSourceEventId

It also stores a non-unique eventSignatureHash. This is only a semantic hint for future query-time merging/gap filling. It is not unique and must not suppress imports.

Tachograph import job model

For real tachograph DB extraction, use a tachograph import request. This describes the job and produces an import plan. SQL extraction routes are intentionally scaffolded as the next implementation step.

POST /api/eventhub/acquisition/tachograph/imports/plan
POST /api/eventhub/acquisition/tachograph/imports/start

Example: initial import from one root organisation and its children:

{
  "tenantKey": "kralowetz",
  "eventSource": {
    "providerKey": "TACHOGRAPH",
    "sourceKind": "MIXED",
    "sourceKey": "TACHOGRAPH_DB",
    "sourceInstanceKey": "main-tachograph-db",
    "tenantProviderSettingKey": "kralowetz-tachograph-prod"
  },
  "sourceGroup": {
    "type": "ORGANISATION",
    "sourceEntityId": "147",
    "code": "147",
    "name": "Kralowetz"
  },
  "importScope": {
    "type": "SOURCE_ORGANISATION_SUBTREE",
    "rootSourceOrganisation": {
      "type": "ORGANISATION",
      "sourceEntityId": "147",
      "code": "147",
      "name": "Kralowetz"
    },
    "includeChildren": true,
    "occurredFrom": "2025-01-01T00:00:00+01:00",
    "occurredTo": null
  },
  "eventFamilies": [
    "DRIVER_ACTIVITY",
    "DRIVER_CARD",
    "POSITION",
    "BORDER_CROSSING",
    "LOAD_UNLOAD",
    "PLACE",
    "SPECIFIC_CONDITION",
    "SPEEDING"
  ],
  "mode": "INITIAL_BACKFILL",
  "refreshMasterDataFirst": true,
  "acquisitionStrategy": "OCCURRED_AT_WINDOW_WITH_OVERLAP"
}

Example: regular incremental update:

{
  "tenantKey": "kralowetz",
  "eventSource": {
    "providerKey": "TACHOGRAPH",
    "sourceKind": "MIXED",
    "sourceKey": "TACHOGRAPH_DB",
    "sourceInstanceKey": "main-tachograph-db",
    "tenantProviderSettingKey": "kralowetz-tachograph-prod"
  },
  "sourceGroup": {
    "type": "ORGANISATION",
    "sourceEntityId": "147"
  },
  "importScope": {
    "type": "SOURCE_ORGANISATION_SUBTREE",
    "rootSourceOrganisation": {
      "type": "ORGANISATION",
      "sourceEntityId": "147"
    },
    "includeChildren": true,
    "occurredFrom": null,
    "occurredTo": null
  },
  "eventFamilies": ["DRIVER_ACTIVITY", "DRIVER_CARD", "POSITION", "BORDER_CROSSING", "LOAD_UNLOAD", "PLACE", "SPECIFIC_CONDITION", "SPEEDING"],
  "mode": "INCREMENTAL_UPDATE",
  "refreshMasterDataFirst": true,
  "acquisitionStrategy": "SOURCE_PACKAGE_WATERMARK"
}

Tachograph extraction plan

The import-plan service currently creates extraction definitions like:

DRIVER_ACTIVITY / VEHICLE_UNIT -> VUActivity
DRIVER_ACTIVITY / DRIVER_CARD  -> CardActivity
DRIVER_CARD     / VEHICLE_UNIT -> IWCycle
DRIVER_CARD     / DRIVER_CARD  -> CardVehiclesUsed
POSITION        / VEHICLE_UNIT -> VUPlaces, VULoadUnload, VUGnssAccumulatedDriving, VUBorderCrossing
POSITION        / DRIVER_CARD  -> CardPlaces, CardLoadUnload, CardGnssAccumulatedDriving, CardBorderCrossing
BORDER_CROSSING / VEHICLE_UNIT -> VUBorderCrossing
BORDER_CROSSING / DRIVER_CARD  -> CardBorderCrossing
LOAD_UNLOAD     / VEHICLE_UNIT -> VULoadUnload
LOAD_UNLOAD     / DRIVER_CARD  -> CardLoadUnload
SPECIFIC_CONDITION / VEHICLE_UNIT -> VUSpecificCondition
SPECIFIC_CONDITION / DRIVER_CARD  -> CardSpecificCondition
PLACE           / VEHICLE_UNIT -> VUPlaces
PLACE           / DRIVER_CARD  -> CardPlaces
SPEEDING        / VEHICLE_UNIT -> SpeedingEvents

The next implementation step is to replace the scaffolded plan items with actual Camel/JDBC SQL extraction routes.

Acquisition alternatives considered

Alternative A: occurred-time window import

Read events by occurredAt for a root organisation/time window.

Pros:

simple
works for initial backfill
matches explicit from/to import requests

Cons:

unsafe as the only incremental method because a newly imported card/VU package can contain old occurredAt data
requires overlap windows for regular updates

Best use:

initial backfill and reprocessing
fallback incremental strategy with overlap

Alternative B: source-package watermark import

Read original tachograph card/VU packages that were imported/changed in the tachograph DB since the last successful EventHub run, then extract all events belonging to those packages.

Pros:

best for regular updates
handles late-arriving historical tachograph packages
fits the tachograph package concept

Cons:

requires reliable source package metadata and links from event rows to package/source download
more complex SQL and cursor state

Best use:

primary incremental strategy if tachograph DB exposes package import timestamps/ids

Alternative C: source-row watermark import

Read source event rows changed since last run using row-level updatedAt or monotonic IDs.

Pros:

precise if row update timestamps are reliable
does not require package-level model

Cons:

not possible if source tables do not have reliable changed/updated metadata
harder across many event tables

Best use:

fallback when rows have reliable updatedAt/row version fields

Alternative D: per vehicle/per driver polling

After master-data refresh, loop through vehicles and drivers in the selected organisation subtree and read their event data.

Pros:

matches your existing data acquisition pattern
naturally separates vehicle-unit and driver-card data
supports organisation-scoped imports well

Cons:

can be slower for large fleets
requires careful batching/chunking and parallelism
can miss late old data unless combined with package/row watermark or overlap

Best use:

scope resolution and controlled extraction, combined with Alternative A or B

Use a hybrid:

Initial import:
  master data first
  organisation subtree + occurredFrom/occurredTo
  chunk by time and/or vehicle/driver
  import idempotently by sourceRecordKeyHash

Regular update:
  master data first
  prefer source-package watermark
  fallback to occurredAt overlap window if package metadata is insufficient
  import idempotently by sourceRecordKeyHash

This means the EventHub acquisition package is an extraction package, while the original tachograph card/VU package should be preserved as source metadata in payload or later in a dedicated source-package table.

Existing package-level normalized event ingestion

POST /api/eventhub/acquisition/packages

Example:

{
  "package": {
    "tenantKey": "kralowetz",
    "eventSource": {
      "providerKey": "TACHOGRAPH",
      "sourceKind": "VEHICLE_UNIT",
      "sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
      "sourceInstanceKey": "main-tachograph-db",
      "tenantProviderSettingKey": "kralowetz-tachograph-prod"
    },
    "sourceGroup": {
      "type": "ORGANISATION",
      "sourceEntityId": "147"
    },
    "importScope": {
      "type": "SOURCE_ORGANISATION_SUBTREE",
      "rootSourceOrganisation": {
        "type": "ORGANISATION",
        "sourceEntityId": "147"
      },
      "includeChildren": true,
      "occurredFrom": "2026-04-28T00:00:00+02:00",
      "occurredTo": "2026-04-29T00:00:00+02:00"
    },
    "eventFamily": "DRIVER_ACTIVITY",
    "businessDate": "2026-04-28",
    "externalPackageId": "TACHOGRAPH:ORG-147-SUBTREE:DRIVER_ACTIVITY:2026-04-28"
  },
  "events": [
    {
      "externalSourceEventId": "TACHOGRAPH:VEHICLE_UNIT:activity:456:start",
      "driverRef": {
        "sourceEntityId": "driver-100",
        "driverCard": {
          "nation": "AT",
          "number": "D123456789"
        }
      },
      "vehicleRef": {
        "sourceEntityId": "vehicle-200",
        "vin": "WDB9634031L123456",
        "vehicleRegistration": {
          "nation": "AT",
          "number": "W-12345"
        }
      },
      "occurredAt": "2026-04-28T08:00:00+02:00",
      "eventDomain": "DRIVER_ACTIVITY",
      "eventType": "DRIVE",
      "lifecycle": "START",
      "eventDetails": {
        "type": "DRIVER_ACTIVITY",
        "attributes": {
          "cardSlot": "DRIVER",
          "cardStatus": "INSERTED",
          "drivingStatus": "SINGLE"
        }
      },
      "payload": {
        "raw": {
          "activity": 3,
          "cardSlot": 0,
          "cardStatus": 0,
          "drivingStatus": 0
        }
      }
    }
  ]
}

Routes

direct:yellowfox-d8-booking-input
direct:telematics-position-input
direct:tachograph-activity-input
direct:tachograph-import-start
direct:eventhub-package-input
direct:eventhub-manual-input

Common route:

direct:eventhub-normalized-input
    -> validate EventHubEventDto
    -> create package key from tenant + EventSource + sourceGroup + importScope + eventFamily
    -> seda:eventhub-batch-input
    -> aggregate by eventhub.packageKey
    -> sort by occurredAt inside the batch
    -> EventHubIngestionService.ingest(...)

Start PostgreSQL

docker compose up -d

Run the service

mvn spring-boot:run

Check acquisition packages

select p.received_at,
       p.tenant_key,
       s.provider_key,
       s.source_kind,
       s.source_key,
       p.source_group_type,
       p.source_group_entity_id,
       p.import_scope_type,
       p.root_source_org_entity_id,
       p.occurred_from,
       p.occurred_to,
       p.event_family,
       p.business_date,
       p.status,
       p.event_count
from eventhub.data_package p
join eventhub.event_source s on s.id = p.event_source_id
order by p.received_at desc;

Check acquired events

select occurred_at,
       driver_source_entity_id,
       driver_card_nation,
       driver_card_number,
       vehicle_source_entity_id,
       vehicle_vin,
       vehicle_registration_nation,
       vehicle_registration_number,
       event_domain,
       event_type,
       lifecycle,
       event_signature_hash,
       event_details,
       payload
from eventhub.acquired_event
order by occurred_at desc;

Next implementation steps

  1. Add actual Camel/JDBC extraction routes behind the tachograph import plan.
  2. Implement master-data acquisition first: organisation tree, driver/card assignments, vehicle VIN/VRN assignments, driver/vehicle organisation assignment histories.
  3. Implement initial backfill using organisation/time scope.
  4. Implement incremental import using source-package watermark, with occurredAt overlap fallback.
  5. Discuss query/read models later: source priority and gap filling across tachograph, YellowFox and other sources.