You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/MEMORY-OPTIMIZATION.md

3.4 KiB

Memory Optimization Changes

Problem

Persistent OutOfMemoryError crashes after ~30 minutes of operation.

Root Causes Identified

  1. Parallel Processing - Too many concurrent threads processing XML files
  2. Vectorization - Heavy memory consumption from embedding service calls
  3. Connection Leaks - HikariCP pool too large (20 connections)
  4. Duplicate File Processing - File Consumer route was disabled but still causing issues

Changes Made (2026-01-07)

1. Vectorization DISABLED

File: application.yml

vectorization:
  enabled: false  # Was: true

Reason: Vectorization can be re-enabled later after stability is proven

2. Reduced Database Connection Pool

File: application.yml

hikari:
  maximum-pool-size: 5      # Was: 20
  minimum-idle: 2           # Was: 5
  idle-timeout: 300000      # Was: 600000
  max-lifetime: 900000      # Was: 1800000
  leak-detection-threshold: 60000  # NEW

3. Sequential Processing (No Parallelism)

File: TedPackageDownloadCamelRoute.java

  • Parallel Processing DISABLED in XML file splitter
  • Thread pool reduced to 1 thread (was: 3)
  • Only 1 package processed at a time (was: 3)
.split(header("xmlFiles"))
    // .parallelProcessing()  // DISABLED
    .stopOnException(false)

4. File Consumer Already Disabled

File: TedDocumentRoute.java

  • File consumer route commented out to prevent duplicate processing
  • Only Package Download Route processes files

5. Start Script with 8GB Heap

File: start.bat

java -Xms4g -Xmx8g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -jar target\ted-procurement-processor-1.0.0-SNAPSHOT.jar

Performance Impact

Before

  • 3 packages in parallel
  • 3 XML files in parallel per package
  • Vectorization running
  • ~150 concurrent operations
  • Crashes after 30 minutes

After

  • 1 package at a time
  • Sequential XML file processing
  • No vectorization
  • ~10-20 concurrent operations
  • Should run stable indefinitely

How to Start

  1. Reset stuck packages (if any):

    psql -h 94.130.218.54 -p 32333 -U postgres -d RELM -f reset-stuck-packages.sql
    
  2. Start application:

    start.bat
    
  3. Monitor memory:

    • Check logs for OutOfMemoryError
    • Monitor with: jconsole or jvisualvm

Re-enabling Features Later

Step 1: Test with current settings

Run for 24-48 hours to confirm stability

Step 2: Gradually increase parallelism

// In TedPackageDownloadCamelRoute.java
.split(header("xmlFiles"))
    .parallelProcessing()
    .executorService(executorService())  // Set to 2-3 threads

Step 3: Re-enable vectorization

# In application.yml
vectorization:
  enabled: true

Step 4: Increase connection pool (if needed)

hikari:
  maximum-pool-size: 10  # Increase gradually

Monitoring Commands

Check running packages

SELECT package_identifier, download_status, updated_at
FROM ted.ted_daily_package
WHERE download_status IN ('DOWNLOADING', 'PROCESSING')
ORDER BY updated_at DESC;

Check memory usage

jcmd <PID> GC.heap_info

Check thread count

jcmd <PID> Thread.print | grep "ted-" | wc -l

Notes

  • Processing is slower but stable
  • Approx. 50-100 documents/minute (sequential)
  • Can process ~100,000 documents/day
  • Vectorization can be run as separate batch job later