You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
110 lines
3.1 KiB
Markdown
110 lines
3.1 KiB
Markdown
# TED Daily Package Download - Implementierung
|
|
|
|
## Übersicht
|
|
|
|
Das System lädt automatisch TED Daily Packages herunter und verarbeitet sie.
|
|
|
|
## Komponenten
|
|
|
|
### 1. Entity: TedDailyPackage ✅
|
|
- Tracking von Downloads
|
|
- Status-Management
|
|
- Idempotenz durch Hash
|
|
|
|
### 2. Repository: TedDailyPackageRepository ✅
|
|
- Package-Verwaltung
|
|
- Status-Queries
|
|
- Latest-Package-Ermittlung
|
|
|
|
### 3. Configuration: DownloadProperties ✅
|
|
- Download-Einstellungen
|
|
- URL-Konfiguration
|
|
- Rate Limiting
|
|
|
|
### 4. Service: TedPackageDownloadService (in Arbeit)
|
|
- Package-Download
|
|
- tar.gz Extraktion
|
|
- Fortschritts-Tracking
|
|
|
|
### 5. Camel Route: TedPackageDownloadRoute (ausstehend)
|
|
- Scheduled Downloads
|
|
- Error Handling
|
|
- Integration mit bestehender XML-Verarbeitung
|
|
|
|
## Workflow
|
|
|
|
1. **Initialization**
|
|
- Letztes Package aus DB ermitteln
|
|
- Start-Punkt berechnen (aktuelles Jahr oder letztes Package +1)
|
|
|
|
2. **Download-Loop**
|
|
- Current Year: Start bei letztem +1, bis 404 (max 4x)
|
|
- Previous Years: Rückwärts downloaden, langsam
|
|
|
|
3. **Package Processing**
|
|
- Download tar.gz
|
|
- Hash berechnen (SHA-256)
|
|
- Prüfung gegen DB (Idempotenz)
|
|
- Extraktion der XML-Dateien
|
|
- Weiterleitung an XML-Verarbeitungsroute
|
|
|
|
4. **Status Tracking**
|
|
- PENDING → DOWNLOADING → DOWNLOADED → PROCESSING → COMPLETED
|
|
- Fehlerbehandlung: FAILED, NOT_FOUND
|
|
|
|
## Konfiguration (application.yml)
|
|
|
|
```yaml
|
|
ted:
|
|
download:
|
|
enabled: true
|
|
base-url: https://ted.europa.eu/packages/daily/
|
|
download-directory: D:/ted.europe/downloads
|
|
extract-directory: D:/ted.europe/extracted
|
|
start-year: 2024
|
|
max-consecutive-404: 4
|
|
poll-interval: 3600000 # 1 Stunde
|
|
download-timeout: 300000 # 5 Minuten
|
|
max-concurrent-downloads: 2
|
|
delay-between-downloads: 5000 # 5 Sekunden
|
|
delete-after-extraction: true
|
|
prioritize-current-year: true
|
|
```
|
|
|
|
## Database Migration
|
|
|
|
```sql
|
|
CREATE TABLE TED.ted_daily_package (
|
|
id UUID PRIMARY KEY,
|
|
package_identifier VARCHAR(20) NOT NULL UNIQUE,
|
|
year INTEGER NOT NULL,
|
|
serial_number INTEGER NOT NULL,
|
|
download_url VARCHAR(500) NOT NULL,
|
|
file_hash VARCHAR(64),
|
|
xml_file_count INTEGER,
|
|
processed_count INTEGER DEFAULT 0,
|
|
failed_count INTEGER DEFAULT 0,
|
|
download_status VARCHAR(30) NOT NULL DEFAULT 'PENDING',
|
|
error_message TEXT,
|
|
downloaded_at TIMESTAMP WITH TIME ZONE,
|
|
processed_at TIMESTAMP WITH TIME ZONE,
|
|
download_duration_ms BIGINT,
|
|
processing_duration_ms BIGINT,
|
|
created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
|
updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
|
|
UNIQUE(year, serial_number)
|
|
);
|
|
|
|
CREATE INDEX idx_package_identifier ON TED.ted_daily_package(package_identifier);
|
|
CREATE INDEX idx_package_year_serial ON TED.ted_daily_package(year, serial_number);
|
|
CREATE INDEX idx_package_status ON TED.ted_daily_package(download_status);
|
|
CREATE INDEX idx_package_downloaded_at ON TED.ted_daily_package(downloaded_at);
|
|
```
|
|
|
|
## Nächste Schritte
|
|
|
|
1. Package Download Service fertigstellen
|
|
2. Camel Route erstellen
|
|
3. Database Migration ausführen
|
|
4. Testing & Integration
|