# Import Performance and Error Fixes ## Scope This document summarizes the event-ingest, master-data, schema, locking, retry, and correctness fixes made in the current optimization round for the EventHub import pipeline. It covers: - event schema and migration fixes - event import throughput fixes - master-data import throughput fixes - deadlock and transaction-visibility fixes - connection-reset and retry handling - async import cursor correctness fixes - logging and operational visibility improvements Main committed changes: - `2e6e1aa` - `Optimize ingestion pipeline and reduce import contention` - `bd3620b` - `Improve vehicle reference caching during ingest` Additional related fixes are currently present in the workspace but may not yet be committed. ## 1. Schema and Migration Fixes ### 1.1 `event_detail` / hypertable ordering Problem: - executing `eventhub_schema_create.sql` on an empty database failed with `relation "eventhub.event_detail" does not exist` Fix: - create `eventhub.event_detail` before `create_hypertable(...)` - add its foreign key after the hypertable conversion Files: - `src/main/resources/db/eventhub_schema_create.sql` ### 1.2 Explicit migrations for event hypertable and source-record support Problem: - the runtime schema evolution needed explicit migrations for hypertable conversion and `event_source_record` Fix: - add migration for `source_package_id` on `event` - add migration for `event` hypertable conversion and FK recreation - add migration to ensure `event_source_record` exists and is backfilled Files: - `src/main/resources/db/migration/V9__add_event_source_package_id.sql` - `src/main/resources/db/migration/V10__make_event_hypertable.sql` - `src/main/resources/db/migration/V11__ensure_event_source_record.sql` ## 2. Event Import Throughput Fixes ### 2.1 Replace per-event inserts with staged set-based writes Problem: - `EventRepository.batchInsert(...)` originally processed events one by one despite the batch API - this caused one insert/query cycle per event and poor throughput Fix: - stage a whole ingest batch into `eventhub_event_import_stage` - reserve `event_source_record` rows set-wise - insert `eventhub.event` rows set-wise - upsert `eventhub.event_detail` rows in batch Files: - `src/main/java/at/procon/eventhub/persistence/EventRepository.java` ### 2.2 Fix missing event rows when source records were reserved Problem: - after the set-based refactor, some runs created `event_source_record` rows without creating `event` rows Cause: - the insert statement reserved source records and then tried to re-read them through the base table in the same data-modifying CTE chain Fix: - use the `RETURNING` rows from the source-record reservation CTE directly - also support already-existing source records that still miss the `event` row Files: - `src/main/java/at/procon/eventhub/persistence/EventRepository.java` ### 2.3 Stream extraction instead of materializing full result sets Problem: - extraction loaded full SQL chunks into memory before handing them to Camel Fix: - stream rows directly from JDBC to `direct:eventhub-normalized-input` - keep only counters and watermark information Files: - `src/main/java/at/procon/eventhub/importing/extraction/AbstractJdbcExtractionBatchExecutor.java` - `src/main/java/at/procon/eventhub/tachograph/service/JdbcTachographExtractionBatchExecutor.java` ### 2.4 Increase batch size and enable parallel queue draining Problem: - the async ingest route drained too slowly with `1000`-event batches and a single consumer Fix: - raise Camel completion size from `1000` to `5000` - enable `4` concurrent SEDA consumers Files: - `src/main/java/at/procon/eventhub/config/EventHubProperties.java` - `src/main/java/at/procon/eventhub/camel/EventHubCommonIngestionRoute.java` - `src/main/resources/application.yml` ### 2.5 Give each Camel flush its own package key Problem: - multiple flushes of the same extraction package reused the same `data_package` identity - logs were misleading and `event_count` on the package row was overwritten by later flushes Fix: - derive a unique `packageKey` per completed Camel batch using the aggregate package key plus the Camel exchange id - preserve both the aggregate key and the child key in metadata Files: - `src/main/java/at/procon/eventhub/camel/EventHubBatchBuildProcessor.java` ### 2.6 Improve batch-local entity and vehicle caching Problem: - after the bulk insert refactor, the main remaining hot path was still reference resolution Fixes: - cache entity ids in the batch by `entityType + sourceEntityId` - cache vehicle resolutions inside a batch - later extend vehicle caching to be range-aware for registration-based assignment lookups: - direct vehicle identifiers cache without time sensitivity - registration-based resolutions cache over assignment validity intervals Files: - `src/main/java/at/procon/eventhub/persistence/EventRepository.java` - `src/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.java` ## 3. Master-Data Import Throughput Fixes ### 3.1 Set-based master entity and relation upserts Problem: - source master data was previously written row by row Fix: - stage master entities and relations into temporary tables - run set-based `insert ... select ... on conflict do update` Files: - `src/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.java` ### 3.2 Stream and chunk master-data refresh Problem: - the refresh path loaded large source master-data result sets into memory Fix: - stream source rows - flush in chunks of `5000` Files: - `src/main/java/at/procon/eventhub/tachograph/service/TachographMasterDataRefreshService.java` ### 3.3 Bulk vehicle reconciliation from master data Problem: - reconciling vehicles and registrations from master data was done row by row Fix: - replace the loop with set-based SQL for vehicles, registrations, and projected assignments Files: - `src/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.java` ## 4. Deadlock and Contention Fixes ### 4.1 Remove unnecessary hot-row updates on vehicle and registration rows Problem: - event import updated `vehicle.updated_at` and `vehicle_registration.updated_at` even when no new information was being added - this created deadlocks under parallel ingest Fix: - only update `vehicle` when missing `source_vehicle_entity_id` or `vin` can actually be filled - only update `vehicle_registration` when missing source id, nation, or registration number can actually be filled - stop using event import as a generic "touch row" path Files: - `src/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.java` ### 4.2 Make event-time source master entity resolution "find or create", not "update on conflict" Problem: - concurrent event batches could deadlock on `eventhub.source_master_entity` through `INSERT ... ON CONFLICT DO UPDATE` Fix: - first `SELECT id` - if missing, `INSERT ... ON CONFLICT DO NOTHING RETURNING id` - if another transaction won the race, select again - do not update existing master entity rows during event ingest Files: - `src/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.java` ### 4.3 Fix race handling when `RETURNING` returns no row Problem: - if a concurrent transaction inserted the entity first, the resolver could still fail unexpectedly Fix: - allow the `RETURNING` path to yield `null` - retry with a follow-up `SELECT` Files: - `src/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.java` ## 5. Transaction Visibility and Correctness Fixes ### 5.1 Remove outer transaction around full tachograph execution Problem: - master-data refresh logs showed completion, but master-data rows were not visible yet because the outer import method still held the transaction open Fix: - remove the outer transaction from `startAndExecuteImport(...)` - keep chunk-level and package-level transactions independent Files: - `src/main/java/at/procon/eventhub/tachograph/service/TachographImportExecutionService.java` ### 5.2 Preserve the original ingest exception if package failure marking also fails Problem: - when ingest failed and `markFailed(...)` also failed because of a broken connection, the secondary bookkeeping error hid the real root cause Fix: - wrap `dataPackageRepository.markFailed(...)` in its own `try/catch` - log the bookkeeping failure - keep the original ingest exception and attach the bookkeeping failure as suppressed Files: - `src/main/java/at/procon/eventhub/service/EventHubIngestionService.java` ### 5.3 Do not advance import cursors before async ingest really finishes Problem: - extraction previously marked packages imported and advanced `import_cursor` before the async `CAMEL_BATCH` ingest was durably finished - this could skip source data on the next run if async ingest later failed Fix: - add grouped child-batch status lookup on `data_package` - make extraction package completion wait for all derived `CAMEL_BATCH` rows to reach terminal success - fail the planned extraction package if child batches fail or time out - only advance the cursor after the async ingest succeeded - make "Completed import run" mean durable ingest completion instead of extraction completion Files: - `src/main/java/at/procon/eventhub/importing/AbstractImportExecutionService.java` - `src/main/java/at/procon/eventhub/persistence/DataPackageRepository.java` - `src/main/java/at/procon/eventhub/tachograph/service/TachographImportExecutionService.java` ## 6. Connection Reset and Retry Hardening ### 6.1 Retry transient DB failures in the Camel ingest route Problem: - long-running imports hit transient failures such as deadlocks and connection resets Fix: - add Camel redelivery with exponential backoff for: - `CannotAcquireLockException` - `PessimisticLockingFailureException` - `DataAccessResourceFailureException` - `TransientDataAccessException` Files: - `src/main/java/at/procon/eventhub/camel/EventHubCommonIngestionRoute.java` ### 6.2 Tune Hikari for shorter-lived and healthier pooled connections Problem: - `SQLSTATE 08006` / `Connection reset` events left broken pool entries behind Fix: - configure Hikari with explicit pool sizing and connection lifetime / keepalive settings: - `maximum-pool-size: 16` - `minimum-idle: 4` - `connection-timeout: 30000` - `validation-timeout: 5000` - `idle-timeout: 300000` - `keepalive-time: 120000` - `max-lifetime: 540000` Files: - `src/main/resources/application.yml` ## 7. Observability Improvements ### 7.1 Master-data progress logging Added logs for: - refresh start - per-section progress - per-chunk counts - `byType` breakdowns - section completion - reconciliation start and result Files: - `src/main/java/at/procon/eventhub/tachograph/service/TachographMasterDataRefreshService.java` ### 7.2 Event extraction progress logging Added logs for: - extraction start - progress every `5000` mapped events - final mapped totals with `byType` Files: - `src/main/java/at/procon/eventhub/importing/extraction/AbstractJdbcExtractionBatchExecutor.java` ### 7.3 Event ingest throughput logging Added logs for: - `receivedCount` - `insertedCount` - `elapsedMs` - `receivedPerSecond` - `byType` Files: - `src/main/java/at/procon/eventhub/service/EventHubIngestionService.java` ### 7.4 Async-ingest wait progress logging Added logs for: - number of expected child batches - observed child batches - successful / failed / importing child-batch counts while the import executor waits for durable completion Files: - `src/main/java/at/procon/eventhub/importing/AbstractImportExecutionService.java` ## 8. Operational Notes ### Throughput effect seen during the optimization round Observed progression during the work: - roughly `30` events/sec before the later cache and blocking fixes - roughly `300` rows/sec after the main contention and stuck-session cleanup work This is a major improvement, but large historical backfills are still expensive. ### What remains expensive The main remaining bottleneck is still reference resolution in the ingest hot path, especially: - driver entity resolution - source-package entity resolution - vehicle / registration lookup and creation The next major optimization step would be set-based pre-resolution of references per ingest batch instead of resolving them one event at a time. ### Safe rerun behavior - event ingest remains idempotent through `event_source_record.source_record_key_hash` - already imported events should generally be kept - when historical cursor corruption existed, repair should target `import_cursor`, not wholesale deletion of imported events ## 9. Main Files Touched - `src/main/java/at/procon/eventhub/persistence/EventRepository.java` - `src/main/java/at/procon/eventhub/persistence/VehicleIdentityRepository.java` - `src/main/java/at/procon/eventhub/persistence/SourceMasterDataRepository.java` - `src/main/java/at/procon/eventhub/persistence/DataPackageRepository.java` - `src/main/java/at/procon/eventhub/service/EventHubIngestionService.java` - `src/main/java/at/procon/eventhub/camel/EventHubCommonIngestionRoute.java` - `src/main/java/at/procon/eventhub/camel/EventHubBatchBuildProcessor.java` - `src/main/java/at/procon/eventhub/importing/extraction/AbstractJdbcExtractionBatchExecutor.java` - `src/main/java/at/procon/eventhub/importing/AbstractImportExecutionService.java` - `src/main/java/at/procon/eventhub/tachograph/service/JdbcTachographExtractionBatchExecutor.java` - `src/main/java/at/procon/eventhub/tachograph/service/TachographMasterDataRefreshService.java` - `src/main/java/at/procon/eventhub/tachograph/service/TachographImportExecutionService.java` - `src/main/java/at/procon/eventhub/config/EventHubProperties.java` - `src/main/resources/application.yml` - `src/main/resources/db/eventhub_schema_create.sql` - `src/main/resources/db/migration/V9__add_event_source_package_id.sql` - `src/main/resources/db/migration/V10__make_event_hypertable.sql` - `src/main/resources/db/migration/V11__ensure_event_source_record.sql`