# TED Daily Package Download - Implementierung ## Übersicht Das System lädt automatisch TED Daily Packages herunter und verarbeitet sie. ## Komponenten ### 1. Entity: TedDailyPackage ✅ - Tracking von Downloads - Status-Management - Idempotenz durch Hash ### 2. Repository: TedDailyPackageRepository ✅ - Package-Verwaltung - Status-Queries - Latest-Package-Ermittlung ### 3. Configuration: DownloadProperties ✅ - Download-Einstellungen - URL-Konfiguration - Rate Limiting ### 4. Service: TedPackageDownloadService (in Arbeit) - Package-Download - tar.gz Extraktion - Fortschritts-Tracking ### 5. Camel Route: TedPackageDownloadRoute (ausstehend) - Scheduled Downloads - Error Handling - Integration mit bestehender XML-Verarbeitung ## Workflow 1. **Initialization** - Letztes Package aus DB ermitteln - Start-Punkt berechnen (aktuelles Jahr oder letztes Package +1) 2. **Download-Loop** - Current Year: Start bei letztem +1, bis 404 (max 4x) - Previous Years: Rückwärts downloaden, langsam 3. **Package Processing** - Download tar.gz - Hash berechnen (SHA-256) - Prüfung gegen DB (Idempotenz) - Extraktion der XML-Dateien - Weiterleitung an XML-Verarbeitungsroute 4. **Status Tracking** - PENDING → DOWNLOADING → DOWNLOADED → PROCESSING → COMPLETED - Fehlerbehandlung: FAILED, NOT_FOUND ## Konfiguration (application.yml) ```yaml ted: download: enabled: true base-url: https://ted.europa.eu/packages/daily/ download-directory: D:/ted.europe/downloads extract-directory: D:/ted.europe/extracted start-year: 2024 max-consecutive-404: 4 poll-interval: 3600000 # 1 Stunde download-timeout: 300000 # 5 Minuten max-concurrent-downloads: 2 delay-between-downloads: 5000 # 5 Sekunden delete-after-extraction: true prioritize-current-year: true ``` ## Database Migration ```sql CREATE TABLE TED.ted_daily_package ( id UUID PRIMARY KEY, package_identifier VARCHAR(20) NOT NULL UNIQUE, year INTEGER NOT NULL, serial_number INTEGER NOT NULL, download_url VARCHAR(500) NOT NULL, file_hash VARCHAR(64), xml_file_count INTEGER, processed_count INTEGER DEFAULT 0, failed_count INTEGER DEFAULT 0, download_status VARCHAR(30) NOT NULL DEFAULT 'PENDING', error_message TEXT, downloaded_at TIMESTAMP WITH TIME ZONE, processed_at TIMESTAMP WITH TIME ZONE, download_duration_ms BIGINT, processing_duration_ms BIGINT, created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP, UNIQUE(year, serial_number) ); CREATE INDEX idx_package_identifier ON TED.ted_daily_package(package_identifier); CREATE INDEX idx_package_year_serial ON TED.ted_daily_package(year, serial_number); CREATE INDEX idx_package_status ON TED.ted_daily_package(download_status); CREATE INDEX idx_package_downloaded_at ON TED.ted_daily_package(downloaded_at); ``` ## Nächste Schritte 1. Package Download Service fertigstellen 2. Camel Route erstellen 3. Database Migration ausführen 4. Testing & Integration