Go to file
trifonovt c52712f881 Add tachograph import run planning 2026-04-30 12:40:12 +02:00
docs/timescale Initial eventhub ingestion service 2026-04-30 11:01:01 +02:00
src Add tachograph import run planning 2026-04-30 12:40:12 +02:00
.gitignore Initial eventhub ingestion service 2026-04-30 11:01:01 +02:00
README.md Add tachograph import run planning 2026-04-30 12:40:12 +02:00
docker-compose.yml Initial eventhub ingestion service 2026-04-30 11:01:01 +02:00
pom.xml Initial eventhub ingestion service 2026-04-30 11:01:01 +02:00

README.md

EventHub Acquisition Service

Spring Boot + Apache Camel skeleton for acquiring normalized EventHub point events from multiple providers/sources.

The current version focuses on acquisition from source systems, especially tachograph DB data. It stores source records as imported. It does not merge or deduplicate equivalent events from different providers/sources. It does keep a non-unique eventSignatureHash as a future query/projection hint.

Namespace

at.procon.eventhub

Main model decisions

One event = one point in time

EventHubEventDto has exactly one timestamp:

occurredAt

There is no generic duration, endTime, validFrom, or validTo. If a source row represents an interval, a mapper may emit separate point events such as DRIVE START and DRIVE END.

Tenant is package/job-level

tenantKey identifies the customer/data owner. It is mandatory for import packages and tachograph import requests.

EventSource identifies the technical source

Example:

{
  "providerKey": "TACHOGRAPH",
  "sourceKind": "VEHICLE_UNIT",
  "sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
  "sourceInstanceKey": "main-tachograph-db",
  "tenantProviderSettingKey": "kralowetz-tachograph-prod",
  "externalFleetKey": null
}

Examples:

TACHOGRAPH / VEHICLE_UNIT
TACHOGRAPH / DRIVER_CARD
YELLOWFOX / TELEMATICS_PLATFORM / YELLOWFOX_D8
FLEETBOARD / TELEMATICS_PLATFORM / FLEETBOARD_POSITION

SourceGroup is package/source grouping only

For tachograph, sourceGroup can identify the selected source organisation/root organisation.

"sourceGroup": {
  "type": "ORGANISATION",
  "sourceEntityId": "147",
  "code": "147",
  "name": "Kralowetz"
}

For YellowFox, it can identify the provider fleet.

"sourceGroup": {
  "type": "FLEET",
  "sourceEntityId": "7",
  "code": "7",
  "name": "YellowFox Fleet 7"
}

YellowFox fleet is not forced to be an organisation. It belongs to the same tenant/customer and can later be mapped or resolved through vehicle/driver master data if needed.

ImportScope describes data selection

importScope describes what was selected from the source system.

Full DB import:

"importScope": {
  "type": "TENANT_ALL",
  "rootSourceOrganisation": null,
  "includeChildren": false,
  "occurredFrom": null,
  "occurredTo": null
}

Organisation subtree + time-window import:

"importScope": {
  "type": "SOURCE_ORGANISATION_SUBTREE",
  "rootSourceOrganisation": {
    "type": "ORGANISATION",
    "sourceEntityId": "147",
    "code": "147",
    "name": "Kralowetz"
  },
  "includeChildren": true,
  "occurredFrom": "2026-04-28T00:00:00+02:00",
  "occurredTo": "2026-04-29T00:00:00+02:00"
}

occurredFrom is inclusive. occurredTo is exclusive. Both can be null for complete DB/history imports.

Driver/vehicle refs do not contain organisation

Organisation assignment is a master-data relation, not an event property.

Events depend on driver and/or vehicle. The relation of organisation to driver/vehicle is imported and resolved separately from master data using occurredAt.

Driver ref:

"driverRef": {
  "sourceEntityId": "driver-100",
  "driverCard": {
    "nation": "AT",
    "number": "D123456789"
  }
}

Vehicle ref:

"vehicleRef": {
  "sourceEntityId": "vehicle-200",
  "vin": "WDB9634031L123456",
  "vehicleRegistration": {
    "nation": "AT",
    "number": "W-12345"
  }
}

Driver-card-only imports can carry only a nation-scoped VRN and no VIN:

"vehicleRef": {
  "sourceEntityId": null,
  "vin": null,
  "vehicleRegistration": {
    "nation": "AT",
    "number": "W-12345"
  }
}

Later master-data resolution can connect VRN + nation + occurredAt to a VIN/vehicle.

No cross-source deduplication during acquisition

The acquisition layer stores every source record independently. It uses sourceRecordKeyHash only for idempotency of the same source event:

tenantKey + EventSource + externalSourceEventId

It also stores a non-unique eventSignatureHash. This is only a semantic hint for future query-time merging/gap filling. It is not unique and must not suppress imports.

Tachograph import job model

For real tachograph DB extraction, use a tachograph import request. This describes the job and produces an import plan. SQL extraction routes are intentionally scaffolded as the next implementation step.

POST /api/eventhub/acquisition/tachograph/imports/plan
POST /api/eventhub/acquisition/tachograph/imports/start

Example: initial import from one root organisation and its children:

{
  "tenantKey": "kralowetz",
  "eventSource": {
    "providerKey": "TACHOGRAPH",
    "sourceKind": "MIXED",
    "sourceKey": "TACHOGRAPH_DB",
    "sourceInstanceKey": "main-tachograph-db",
    "tenantProviderSettingKey": "kralowetz-tachograph-prod"
  },
  "sourceGroup": {
    "type": "ORGANISATION",
    "sourceEntityId": "147",
    "code": "147",
    "name": "Kralowetz"
  },
  "importScope": {
    "type": "SOURCE_ORGANISATION_SUBTREE",
    "rootSourceOrganisation": {
      "type": "ORGANISATION",
      "sourceEntityId": "147",
      "code": "147",
      "name": "Kralowetz"
    },
    "includeChildren": true,
    "occurredFrom": "2025-01-01T00:00:00+01:00",
    "occurredTo": null
  },
  "eventFamilies": [
    "DRIVER_ACTIVITY",
    "DRIVER_CARD",
    "POSITION",
    "BORDER_CROSSING",
    "LOAD_UNLOAD",
    "PLACE",
    "SPECIFIC_CONDITION",
    "SPEEDING"
  ],
  "mode": "INITIAL_BACKFILL",
  "refreshMasterDataFirst": true,
  "acquisitionStrategy": "OCCURRED_AT_WINDOW_WITH_OVERLAP"
}

Example: regular incremental update:

{
  "tenantKey": "kralowetz",
  "eventSource": {
    "providerKey": "TACHOGRAPH",
    "sourceKind": "MIXED",
    "sourceKey": "TACHOGRAPH_DB",
    "sourceInstanceKey": "main-tachograph-db",
    "tenantProviderSettingKey": "kralowetz-tachograph-prod"
  },
  "sourceGroup": {
    "type": "ORGANISATION",
    "sourceEntityId": "147"
  },
  "importScope": {
    "type": "SOURCE_ORGANISATION_SUBTREE",
    "rootSourceOrganisation": {
      "type": "ORGANISATION",
      "sourceEntityId": "147"
    },
    "includeChildren": true,
    "occurredFrom": null,
    "occurredTo": null
  },
  "eventFamilies": ["DRIVER_ACTIVITY", "DRIVER_CARD", "POSITION", "BORDER_CROSSING", "LOAD_UNLOAD", "PLACE", "SPECIFIC_CONDITION", "SPEEDING"],
  "mode": "INCREMENTAL_UPDATE",
  "refreshMasterDataFirst": true,
  "acquisitionStrategy": "SOURCE_PACKAGE_WATERMARK"
}

Tachograph extraction plan

The import-plan service currently creates extraction definitions like:

DRIVER_ACTIVITY / VEHICLE_UNIT -> VUActivity
DRIVER_ACTIVITY / DRIVER_CARD  -> CardActivity
DRIVER_CARD     / VEHICLE_UNIT -> IWCycle
DRIVER_CARD     / DRIVER_CARD  -> CardVehiclesUsed
POSITION        / VEHICLE_UNIT -> VUPlaces, VULoadUnload, VUGnssAccumulatedDriving, VUBorderCrossing
POSITION        / DRIVER_CARD  -> CardPlaces, CardLoadUnload, CardGnssAccumulatedDriving, CardBorderCrossing
BORDER_CROSSING / VEHICLE_UNIT -> VUBorderCrossing
BORDER_CROSSING / DRIVER_CARD  -> CardBorderCrossing
LOAD_UNLOAD     / VEHICLE_UNIT -> VULoadUnload
LOAD_UNLOAD     / DRIVER_CARD  -> CardLoadUnload
SPECIFIC_CONDITION / VEHICLE_UNIT -> VUSpecificCondition
SPECIFIC_CONDITION / DRIVER_CARD  -> CardSpecificCondition
PLACE           / VEHICLE_UNIT -> VUPlaces
PLACE           / DRIVER_CARD  -> CardPlaces
SPEEDING        / VEHICLE_UNIT -> SpeedingEvents

The next implementation step is to replace the scaffolded plan items with actual Camel/JDBC SQL extraction routes.

Acquisition alternatives considered

Alternative A: occurred-time window import

Read events by occurredAt for a root organisation/time window.

Pros:

simple
works for initial backfill
matches explicit from/to import requests

Cons:

unsafe as the only incremental method because a newly imported card/VU package can contain old occurredAt data
requires overlap windows for regular updates

Best use:

initial backfill and reprocessing
fallback incremental strategy with overlap

Alternative B: source-package watermark import

Read original tachograph card/VU packages that were imported/changed in the tachograph DB since the last successful EventHub run, then extract all events belonging to those packages.

Pros:

best for regular updates
handles late-arriving historical tachograph packages
fits the tachograph package concept

Cons:

requires reliable source package metadata and links from event rows to package/source download
more complex SQL and cursor state

Best use:

primary incremental strategy if tachograph DB exposes package import timestamps/ids

Alternative C: source-row watermark import

Read source event rows changed since last run using row-level updatedAt or monotonic IDs.

Pros:

precise if row update timestamps are reliable
does not require package-level model

Cons:

not possible if source tables do not have reliable changed/updated metadata
harder across many event tables

Best use:

fallback when rows have reliable updatedAt/row version fields

Alternative D: per vehicle/per driver polling

After master-data refresh, loop through vehicles and drivers in the selected organisation subtree and read their event data.

Pros:

matches your existing data acquisition pattern
naturally separates vehicle-unit and driver-card data
supports organisation-scoped imports well

Cons:

can be slower for large fleets
requires careful batching/chunking and parallelism
can miss late old data unless combined with package/row watermark or overlap

Best use:

scope resolution and controlled extraction, combined with Alternative A or B

Use a hybrid:

Initial import:
  master data first
  organisation subtree + occurredFrom/occurredTo
  chunk by time and/or vehicle/driver
  import idempotently by sourceRecordKeyHash

Regular update:
  master data first
  prefer source-package watermark
  fallback to occurredAt overlap window if package metadata is insufficient
  import idempotently by sourceRecordKeyHash

This means the EventHub acquisition package is an extraction package, while the original tachograph card/VU package should be preserved as source metadata in payload or later in a dedicated source-package table.

Existing package-level normalized event ingestion

POST /api/eventhub/acquisition/packages

Example:

{
  "package": {
    "tenantKey": "kralowetz",
    "eventSource": {
      "providerKey": "TACHOGRAPH",
      "sourceKind": "VEHICLE_UNIT",
      "sourceKey": "TACHOGRAPH_VEHICLE_UNIT",
      "sourceInstanceKey": "main-tachograph-db",
      "tenantProviderSettingKey": "kralowetz-tachograph-prod"
    },
    "sourceGroup": {
      "type": "ORGANISATION",
      "sourceEntityId": "147"
    },
    "importScope": {
      "type": "SOURCE_ORGANISATION_SUBTREE",
      "rootSourceOrganisation": {
        "type": "ORGANISATION",
        "sourceEntityId": "147"
      },
      "includeChildren": true,
      "occurredFrom": "2026-04-28T00:00:00+02:00",
      "occurredTo": "2026-04-29T00:00:00+02:00"
    },
    "eventFamily": "DRIVER_ACTIVITY",
    "businessDate": "2026-04-28",
    "externalPackageId": "TACHOGRAPH:ORG-147-SUBTREE:DRIVER_ACTIVITY:2026-04-28"
  },
  "events": [
    {
      "externalSourceEventId": "TACHOGRAPH:VEHICLE_UNIT:activity:456:start",
      "driverRef": {
        "sourceEntityId": "driver-100",
        "driverCard": {
          "nation": "AT",
          "number": "D123456789"
        }
      },
      "vehicleRef": {
        "sourceEntityId": "vehicle-200",
        "vin": "WDB9634031L123456",
        "vehicleRegistration": {
          "nation": "AT",
          "number": "W-12345"
        }
      },
      "occurredAt": "2026-04-28T08:00:00+02:00",
      "eventDomain": "DRIVER_ACTIVITY",
      "eventType": "DRIVE",
      "lifecycle": "START",
      "eventDetails": {
        "type": "DRIVER_ACTIVITY",
        "attributes": {
          "cardSlot": "DRIVER",
          "cardStatus": "INSERTED",
          "drivingStatus": "SINGLE"
        }
      },
      "payload": {
        "raw": {
          "activity": 3,
          "cardSlot": 0,
          "cardStatus": 0,
          "drivingStatus": 0
        }
      }
    }
  ]
}

Routes

direct:yellowfox-d8-booking-input
direct:telematics-position-input
direct:tachograph-activity-input
direct:tachograph-import-start
direct:eventhub-package-input
direct:eventhub-manual-input

Common route:

direct:eventhub-normalized-input
    -> validate EventHubEventDto
    -> create package key from tenant + EventSource + sourceGroup + importScope + eventFamily
    -> seda:eventhub-batch-input
    -> aggregate by eventhub.packageKey
    -> sort by occurredAt inside the batch
    -> EventHubIngestionService.ingest(...)

Start PostgreSQL

docker compose up -d

Run the service

mvn spring-boot:run

Check acquisition packages

select p.received_at,
       p.tenant_key,
       s.provider_key,
       s.source_kind,
       s.source_key,
       p.source_group_type,
       p.source_group_entity_id,
       p.import_scope_type,
       p.root_source_org_entity_id,
       p.occurred_from,
       p.occurred_to,
       p.event_family,
       p.business_date,
       p.status,
       p.event_count
from eventhub.data_package p
join eventhub.event_source s on s.id = p.event_source_id
order by p.received_at desc;

Check acquired events

select occurred_at,
       driver_source_entity_id,
       driver_card_nation,
       driver_card_number,
       vehicle_source_entity_id,
       vehicle_vin,
       vehicle_registration_nation,
       vehicle_registration_number,
       event_domain,
       event_type,
       lifecycle,
       event_signature_hash,
       event_details,
       payload
from eventhub.acquired_event
order by occurred_at desc;

Next implementation steps

  1. Add actual Camel/JDBC extraction routes behind the tachograph import plan.
  2. Implement master-data acquisition first: organisation tree, driver/card assignments, vehicle VIN/VRN assignments, driver/vehicle organisation assignment histories.
  3. Implement initial backfill using organisation/time scope.
  4. Implement incremental import using source-package watermark, with occurredAt overlap fallback.
  5. Discuss query/read models later: source priority and gap filling across tachograph, YellowFox and other sources.

Implemented tachograph ingestion run model

The tachograph import model now follows the agreed design:

Initial import
  organisation subtree + occurredFrom/occurredTo
  chunked by time and/or entity
  idempotent inserts by sourceRecordKeyHash

Regular update
  refresh master data first
  prefer discovery of new/changed original tachograph source packages
  extract affected event families
  import idempotently
  advance cursor after successful extraction

Package model
  import_run = one execution of the tachograph importer
  data_package = one EventHub extraction batch
  sourcePackageRef = original tachograph driver-card/VU package reference

Deduplication
  no cross-source deduplication
  sourceRecordKeyHash prevents same-source duplicate imports
  eventSignatureHash is non-unique and only helps later query/projection logic

Import run vs tachograph source package

A tachograph database already contains packages from driver cards and vehicle-unit devices. The EventHub model does not force those original packages to become EventHub packages. Instead:

Original tachograph source package
  card/VU package imported into tachograph DB
  stored as sourcePackageRef when extracting events

EventHub data package
  extraction batch produced by an EventHub import run
  grouped by tenant, EventSource, event family, extraction code and time chunk

This allows a single EventHub import run to process many original tachograph packages, and a single extraction batch may contain rows from many original packages. If the SQL extractor can return source package metadata, it should populate EventHubEventDto.sourcePackageRef:

"sourcePackageRef": {
  "packageKind": "VEHICLE_UNIT",
  "sourcePackageId": "VU-PACKAGE-12345",
  "sourceEntityId": "vehicle-90021",
  "packagePeriodFrom": "2026-04-01T00:00:00+02:00",
  "packagePeriodTo": "2026-04-15T00:00:00+02:00",
  "importedIntoSourceAt": "2026-04-16T10:30:00+02:00"
}

Tachograph import endpoints

POST /api/eventhub/acquisition/tachograph/imports/plan returns the calculated event-family extraction plan and time chunks.

POST /api/eventhub/acquisition/tachograph/imports/start now creates:

1 import_run row
N planned data_package rows, one per extraction definition and time chunk

The SQL extraction routes are intentionally separated from run planning. They should pick planned extraction packages, execute the corresponding SQL, map rows to EventHubEventDto, set sourcePackageRef when known, and send them to direct:eventhub-normalized-input.

Initial import

Example initial import request:

{
  "tenantKey": "kralowetz",
  "eventSource": {
    "providerKey": "TACHOGRAPH",
    "sourceKind": "MIXED",
    "sourceKey": "TACHOGRAPH_DB",
    "sourceInstanceKey": "tachograph-prod-db",
    "tenantProviderSettingKey": "kralowetz-tachograph-prod"
  },
  "sourceGroup": {
    "type": "ORGANISATION",
    "sourceEntityId": "147",
    "code": "147",
    "name": "Kralowetz"
  },
  "importScope": {
    "type": "SOURCE_ORGANISATION_SUBTREE",
    "rootSourceOrganisation": {
      "type": "ORGANISATION",
      "sourceEntityId": "147"
    },
    "includeChildren": true,
    "occurredFrom": "2026-01-01T00:00:00+01:00",
    "occurredTo": "2026-02-01T00:00:00+01:00"
  },
  "eventFamilies": [
    "DRIVER_ACTIVITY",
    "DRIVER_CARD",
    "POSITION",
    "BORDER_CROSSING",
    "LOAD_UNLOAD",
    "PLACE",
    "SPECIFIC_CONDITION",
    "SPEEDING"
  ],
  "mode": "INITIAL_BACKFILL",
  "refreshMasterDataFirst": true,
  "acquisitionStrategy": "OCCURRED_AT_WINDOW_WITH_OVERLAP"
}

The plan service chunks the occurred-time range using eventhub.tachograph.default-chunk-days.

Regular update

For regular updates, the preferred mode is:

{
  "mode": "INCREMENTAL_UPDATE",
  "refreshMasterDataFirst": true,
  "acquisitionStrategy": "SOURCE_PACKAGE_WATERMARK"
}

This is preferred because newly imported original driver-card/VU packages can contain older occurredAt events. A simple occurredAt watermark would miss such late-arriving historical data. The eventhub.import_cursor table stores source-package, source-row and occurredAt fallback watermarks per tenant/source/scope/event family/source kind.

Extraction route contract

A future concrete SQL extraction route should do this:

planned data_package
  -> execute SQL for extraction_code and chunk/import scope
  -> map source rows to EventHubEventDto
  -> populate sourcePackageRef if source package metadata is available
  -> send to direct:eventhub-normalized-input
  -> only advance eventhub.import_cursor after successful import