You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
3.4 KiB
3.4 KiB
Memory Optimization Changes
Problem
Persistent OutOfMemoryError crashes after ~30 minutes of operation.
Root Causes Identified
- Parallel Processing - Too many concurrent threads processing XML files
- Vectorization - Heavy memory consumption from embedding service calls
- Connection Leaks - HikariCP pool too large (20 connections)
- Duplicate File Processing - File Consumer route was disabled but still causing issues
Changes Made (2026-01-07)
1. Vectorization DISABLED
File: application.yml
vectorization:
enabled: false # Was: true
Reason: Vectorization can be re-enabled later after stability is proven
2. Reduced Database Connection Pool
File: application.yml
hikari:
maximum-pool-size: 5 # Was: 20
minimum-idle: 2 # Was: 5
idle-timeout: 300000 # Was: 600000
max-lifetime: 900000 # Was: 1800000
leak-detection-threshold: 60000 # NEW
3. Sequential Processing (No Parallelism)
File: TedPackageDownloadCamelRoute.java
- Parallel Processing DISABLED in XML file splitter
- Thread pool reduced to 1 thread (was: 3)
- Only 1 package processed at a time (was: 3)
.split(header("xmlFiles"))
// .parallelProcessing() // DISABLED
.stopOnException(false)
4. File Consumer Already Disabled
File: TedDocumentRoute.java
- File consumer route commented out to prevent duplicate processing
- Only Package Download Route processes files
5. Start Script with 8GB Heap
File: start.bat
java -Xms4g -Xmx8g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -jar target\ted-procurement-processor-1.0.0-SNAPSHOT.jar
Performance Impact
Before
- 3 packages in parallel
- 3 XML files in parallel per package
- Vectorization running
- ~150 concurrent operations
- Crashes after 30 minutes
After
- 1 package at a time
- Sequential XML file processing
- No vectorization
- ~10-20 concurrent operations
- Should run stable indefinitely
How to Start
-
Reset stuck packages (if any):
psql -h 94.130.218.54 -p 32333 -U postgres -d RELM -f reset-stuck-packages.sql -
Start application:
start.bat -
Monitor memory:
- Check logs for OutOfMemoryError
- Monitor with:
jconsoleorjvisualvm
Re-enabling Features Later
Step 1: Test with current settings
Run for 24-48 hours to confirm stability
Step 2: Gradually increase parallelism
// In TedPackageDownloadCamelRoute.java
.split(header("xmlFiles"))
.parallelProcessing()
.executorService(executorService()) // Set to 2-3 threads
Step 3: Re-enable vectorization
# In application.yml
vectorization:
enabled: true
Step 4: Increase connection pool (if needed)
hikari:
maximum-pool-size: 10 # Increase gradually
Monitoring Commands
Check running packages
SELECT package_identifier, download_status, updated_at
FROM ted.ted_daily_package
WHERE download_status IN ('DOWNLOADING', 'PROCESSING')
ORDER BY updated_at DESC;
Check memory usage
jcmd <PID> GC.heap_info
Check thread count
jcmd <PID> Thread.print | grep "ted-" | wc -l
Notes
- Processing is slower but stable
- Approx. 50-100 documents/minute (sequential)
- Can process ~100,000 documents/day
- Vectorization can be run as separate batch job later