14 KiB
Import Performance and Error Fixes
Scope
This document summarizes the event-ingest, master-data, schema, locking, retry, and correctness fixes made in the current optimization round for the EventHub import pipeline.
It covers:
- event schema and migration fixes
- event import throughput fixes
- master-data import throughput fixes
- deadlock and transaction-visibility fixes
- connection-reset and retry handling
- async import cursor correctness fixes
- logging and operational visibility improvements
Main committed changes:
2e6e1aa-Optimize ingestion pipeline and reduce import contentionbd3620b-Improve vehicle reference caching during ingest
Additional related fixes are currently present in the workspace but may not yet be committed.
1. Schema and Migration Fixes
1.1 event_detail / hypertable ordering
Problem:
- executing
eventhub_schema_create.sqlon an empty database failed withrelation "eventhub.event_detail" does not exist
Fix:
- create
eventhub.event_detailbeforecreate_hypertable(...) - add its foreign key after the hypertable conversion
Files:
src/main/resources/db/eventhub_schema_create.sql
1.2 Explicit migrations for event hypertable and source-record support
Problem:
- the runtime schema evolution needed explicit migrations for hypertable conversion and
event_source_record
Fix:
- add migration for
source_package_idonevent - add migration for
eventhypertable conversion and FK recreation - add migration to ensure
event_source_recordexists and is backfilled
Files:
src/main/resources/db/migration/V9__add_event_source_package_id.sqlsrc/main/resources/db/migration/V10__make_event_hypertable.sqlsrc/main/resources/db/migration/V11__ensure_event_source_record.sql
2. Event Import Throughput Fixes
2.1 Replace per-event inserts with staged set-based writes
Problem:
EventRepository.batchInsert(...)originally processed events one by one despite the batch API- this caused one insert/query cycle per event and poor throughput
Fix:
- stage a whole ingest batch into
eventhub_event_import_stage - reserve
event_source_recordrows set-wise - insert
eventhub.eventrows set-wise - upsert
eventhub.event_detailrows in batch
Files:
src/main/java/at/procon/eventhub/persistence/EventRepository.java
2.2 Fix missing event rows when source records were reserved
Problem:
- after the set-based refactor, some runs created
event_source_recordrows without creatingeventrows
Cause:
- the insert statement reserved source records and then tried to re-read them through the base table in the same data-modifying CTE chain
Fix:
- use the
RETURNINGrows from the source-record reservation CTE directly - also support already-existing source records that still miss the
eventrow
Files:
src/main/java/at/procon/eventhub/persistence/EventRepository.java
2.3 Stream extraction instead of materializing full result sets
Problem:
- extraction loaded full SQL chunks into memory before handing them to Camel
Fix:
- stream rows directly from JDBC to
direct:eventhub-normalized-input - keep only counters and watermark information
Files:
src/main/java/at/procon/eventhub/importing/extraction/AbstractJdbcExtractionBatchExecutor.javasrc/main/java/at/procon/eventhub/tachograph/service/JdbcTachographExtractionBatchExecutor.java
2.4 Increase batch size and enable parallel queue draining
Problem:
- the async ingest route drained too slowly with
1000-event batches and a single consumer
Fix:
- raise Camel completion size from
1000to5000 - enable
4concurrent SEDA consumers
Files:
src/main/java/at/procon/eventhub/config/EventHubProperties.javasrc/main/java/at/procon/eventhub/camel/EventHubCommonIngestionRoute.javasrc/main/resources/application.yml
2.5 Give each Camel flush its own package key
Problem:
- multiple flushes of the same extraction package reused the same
data_packageidentity - logs were misleading and
event_counton the package row was overwritten by later flushes
Fix:
- derive a unique
packageKeyper completed Camel batch using the aggregate package key plus the Camel exchange id - preserve both the aggregate key and the child key in metadata
Files:
src/main/java/at/procon/eventhub/camel/EventHubBatchBuildProcessor.java
2.6 Improve batch-local entity and vehicle caching
Problem:
- after the bulk insert refactor, the main remaining hot path was still reference resolution
Fixes:
- cache entity ids in the batch by
entityType + sourceEntityId - cache vehicle resolutions inside a batch
- later extend vehicle caching to be range-aware for registration-based assignment lookups:
- direct vehicle identifiers cache without time sensitivity
- registration-based resolutions cache over assignment validity intervals
Files:
src/main/java/at/procon/eventhub/persistence/EventRepository.javasrc/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.java
3. Master-Data Import Throughput Fixes
3.1 Set-based master entity and relation upserts
Problem:
- source master data was previously written row by row
Fix:
- stage master entities and relations into temporary tables
- run set-based
insert ... select ... on conflict do update
Files:
src/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.java
3.2 Stream and chunk master-data refresh
Problem:
- the refresh path loaded large source master-data result sets into memory
Fix:
- stream source rows
- flush in chunks of
5000
Files:
src/main/java/at/procon/eventhub/tachograph/service/TachographMasterDataRefreshService.java
3.3 Bulk vehicle reconciliation from master data
Problem:
- reconciling vehicles and registrations from master data was done row by row
Fix:
- replace the loop with set-based SQL for vehicles, registrations, and projected assignments
Files:
src/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.java
4. Deadlock and Contention Fixes
4.1 Remove unnecessary hot-row updates on vehicle and registration rows
Problem:
- event import updated
vehicle.updated_atandvehicle_registration.updated_ateven when no new information was being added - this created deadlocks under parallel ingest
Fix:
- only update
vehiclewhen missingsource_vehicle_entity_idorvincan actually be filled - only update
vehicle_registrationwhen missing source id, nation, or registration number can actually be filled - stop using event import as a generic "touch row" path
Files:
src/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.java
4.2 Make event-time source master entity resolution "find or create", not "update on conflict"
Problem:
- concurrent event batches could deadlock on
eventhub.source_master_entitythroughINSERT ... ON CONFLICT DO UPDATE
Fix:
- first
SELECT id - if missing,
INSERT ... ON CONFLICT DO NOTHING RETURNING id - if another transaction won the race, select again
- do not update existing master entity rows during event ingest
Files:
src/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.java
4.3 Fix race handling when RETURNING returns no row
Problem:
- if a concurrent transaction inserted the entity first, the resolver could still fail unexpectedly
Fix:
- allow the
RETURNINGpath to yieldnull - retry with a follow-up
SELECT
Files:
src/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.java
5. Transaction Visibility and Correctness Fixes
5.1 Remove outer transaction around full tachograph execution
Problem:
- master-data refresh logs showed completion, but master-data rows were not visible yet because the outer import method still held the transaction open
Fix:
- remove the outer transaction from
startAndExecuteImport(...) - keep chunk-level and package-level transactions independent
Files:
src/main/java/at/procon/eventhub/tachograph/service/TachographImportExecutionService.java
5.2 Preserve the original ingest exception if package failure marking also fails
Problem:
- when ingest failed and
markFailed(...)also failed because of a broken connection, the secondary bookkeeping error hid the real root cause
Fix:
- wrap
dataPackageRepository.markFailed(...)in its owntry/catch - log the bookkeeping failure
- keep the original ingest exception and attach the bookkeeping failure as suppressed
Files:
src/main/java/at/procon/eventhub/service/EventHubIngestionService.java
5.3 Do not advance import cursors before async ingest really finishes
Problem:
- extraction previously marked packages imported and advanced
import_cursorbefore the asyncCAMEL_BATCHingest was durably finished - this could skip source data on the next run if async ingest later failed
Fix:
- add grouped child-batch status lookup on
data_package - make extraction package completion wait for all derived
CAMEL_BATCHrows to reach terminal success - fail the planned extraction package if child batches fail or time out
- only advance the cursor after the async ingest succeeded
- make "Completed import run" mean durable ingest completion instead of extraction completion
Files:
src/main/java/at/procon/eventhub/importing/AbstractImportExecutionService.javasrc/main/java/at/procon/eventhub/persistence/DataPackageRepository.javasrc/main/java/at/procon/eventhub/tachograph/service/TachographImportExecutionService.java
6. Connection Reset and Retry Hardening
6.1 Retry transient DB failures in the Camel ingest route
Problem:
- long-running imports hit transient failures such as deadlocks and connection resets
Fix:
- add Camel redelivery with exponential backoff for:
CannotAcquireLockExceptionPessimisticLockingFailureExceptionDataAccessResourceFailureExceptionTransientDataAccessException
Files:
src/main/java/at/procon/eventhub/camel/EventHubCommonIngestionRoute.java
6.2 Tune Hikari for shorter-lived and healthier pooled connections
Problem:
SQLSTATE 08006/Connection resetevents left broken pool entries behind
Fix:
- configure Hikari with explicit pool sizing and connection lifetime / keepalive settings:
maximum-pool-size: 16minimum-idle: 4connection-timeout: 30000validation-timeout: 5000idle-timeout: 300000keepalive-time: 120000max-lifetime: 540000
Files:
src/main/resources/application.yml
7. Observability Improvements
7.1 Master-data progress logging
Added logs for:
- refresh start
- per-section progress
- per-chunk counts
byTypebreakdowns- section completion
- reconciliation start and result
Files:
src/main/java/at/procon/eventhub/tachograph/service/TachographMasterDataRefreshService.java
7.2 Event extraction progress logging
Added logs for:
- extraction start
- progress every
5000mapped events - final mapped totals with
byType
Files:
src/main/java/at/procon/eventhub/importing/extraction/AbstractJdbcExtractionBatchExecutor.java
7.3 Event ingest throughput logging
Added logs for:
receivedCountinsertedCountelapsedMsreceivedPerSecondbyType
Files:
src/main/java/at/procon/eventhub/service/EventHubIngestionService.java
7.4 Async-ingest wait progress logging
Added logs for:
- number of expected child batches
- observed child batches
- successful / failed / importing child-batch counts while the import executor waits for durable completion
Files:
src/main/java/at/procon/eventhub/importing/AbstractImportExecutionService.java
8. Operational Notes
Throughput effect seen during the optimization round
Observed progression during the work:
- roughly
30events/sec before the later cache and blocking fixes - roughly
300rows/sec after the main contention and stuck-session cleanup work
This is a major improvement, but large historical backfills are still expensive.
What remains expensive
The main remaining bottleneck is still reference resolution in the ingest hot path, especially:
- driver entity resolution
- source-package entity resolution
- vehicle / registration lookup and creation
The next major optimization step would be set-based pre-resolution of references per ingest batch instead of resolving them one event at a time.
Safe rerun behavior
- event ingest remains idempotent through
event_source_record.source_record_key_hash - already imported events should generally be kept
- when historical cursor corruption existed, repair should target
import_cursor, not wholesale deletion of imported events
9. Main Files Touched
src/main/java/at/procon/eventhub/persistence/EventRepository.javasrc/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.javasrc/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.javasrc/main/java/at/procon/eventhub/persistence/DataPackageRepository.javasrc/main/java/at/procon/eventhub/service/EventHubIngestionService.javasrc/main/java/at/procon/eventhub/camel/EventHubCommonIngestionRoute.javasrc/main/java/at/procon/eventhub/camel/EventHubBatchBuildProcessor.javasrc/main/java/at/procon/eventhub/importing/extraction/AbstractJdbcExtractionBatchExecutor.javasrc/main/java/at/procon/eventhub/importing/AbstractImportExecutionService.javasrc/main/java/at/procon/eventhub/tachograph/service/JdbcTachographExtractionBatchExecutor.javasrc/main/java/at/procon/eventhub/tachograph/service/TachographMasterDataRefreshService.javasrc/main/java/at/procon/eventhub/tachograph/service/TachographImportExecutionService.javasrc/main/java/at/procon/eventhub/config/EventHubProperties.javasrc/main/resources/application.ymlsrc/main/resources/db/eventhub_schema_create.sqlsrc/main/resources/db/migration/V9__add_event_source_package_id.sqlsrc/main/resources/db/migration/V10__make_event_hypertable.sqlsrc/main/resources/db/migration/V11__ensure_event_source_record.sql