You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
140 lines
3.4 KiB
Markdown
140 lines
3.4 KiB
Markdown
# Memory Optimization Changes
|
|
|
|
## Problem
|
|
Persistent OutOfMemoryError crashes after ~30 minutes of operation.
|
|
|
|
## Root Causes Identified
|
|
1. **Parallel Processing** - Too many concurrent threads processing XML files
|
|
2. **Vectorization** - Heavy memory consumption from embedding service calls
|
|
3. **Connection Leaks** - HikariCP pool too large (20 connections)
|
|
4. **Duplicate File Processing** - File Consumer route was disabled but still causing issues
|
|
|
|
## Changes Made (2026-01-07)
|
|
|
|
### 1. Vectorization DISABLED
|
|
**File**: `application.yml`
|
|
```yaml
|
|
vectorization:
|
|
enabled: false # Was: true
|
|
```
|
|
|
|
**Reason**: Vectorization can be re-enabled later after stability is proven
|
|
|
|
### 2. Reduced Database Connection Pool
|
|
**File**: `application.yml`
|
|
```yaml
|
|
hikari:
|
|
maximum-pool-size: 5 # Was: 20
|
|
minimum-idle: 2 # Was: 5
|
|
idle-timeout: 300000 # Was: 600000
|
|
max-lifetime: 900000 # Was: 1800000
|
|
leak-detection-threshold: 60000 # NEW
|
|
```
|
|
|
|
### 3. Sequential Processing (No Parallelism)
|
|
**File**: `TedPackageDownloadCamelRoute.java`
|
|
- **Parallel Processing DISABLED** in XML file splitter
|
|
- Thread pool reduced to 1 thread (was: 3)
|
|
- Only 1 package processed at a time (was: 3)
|
|
|
|
```java
|
|
.split(header("xmlFiles"))
|
|
// .parallelProcessing() // DISABLED
|
|
.stopOnException(false)
|
|
```
|
|
|
|
### 4. File Consumer Already Disabled
|
|
**File**: `TedDocumentRoute.java`
|
|
- File consumer route commented out to prevent duplicate processing
|
|
- Only Package Download Route processes files
|
|
|
|
### 5. Start Script with 8GB Heap
|
|
**File**: `start.bat`
|
|
```batch
|
|
java -Xms4g -Xmx8g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -jar target\ted-procurement-processor-1.0.0-SNAPSHOT.jar
|
|
```
|
|
|
|
## Performance Impact
|
|
|
|
### Before
|
|
- 3 packages in parallel
|
|
- 3 XML files in parallel per package
|
|
- Vectorization running
|
|
- ~150 concurrent operations
|
|
- **Crashes after 30 minutes**
|
|
|
|
### After
|
|
- 1 package at a time
|
|
- Sequential XML file processing
|
|
- No vectorization
|
|
- ~10-20 concurrent operations
|
|
- **Should run stable indefinitely**
|
|
|
|
## How to Start
|
|
|
|
1. **Reset stuck packages** (if any):
|
|
```bash
|
|
psql -h 94.130.218.54 -p 32333 -U postgres -d RELM -f reset-stuck-packages.sql
|
|
```
|
|
|
|
2. **Start application**:
|
|
```bash
|
|
start.bat
|
|
```
|
|
|
|
3. **Monitor memory**:
|
|
- Check logs for OutOfMemoryError
|
|
- Monitor with: `jconsole` or `jvisualvm`
|
|
|
|
## Re-enabling Features Later
|
|
|
|
### Step 1: Test with current settings
|
|
Run for 24-48 hours to confirm stability
|
|
|
|
### Step 2: Gradually increase parallelism
|
|
```java
|
|
// In TedPackageDownloadCamelRoute.java
|
|
.split(header("xmlFiles"))
|
|
.parallelProcessing()
|
|
.executorService(executorService()) // Set to 2-3 threads
|
|
```
|
|
|
|
### Step 3: Re-enable vectorization
|
|
```yaml
|
|
# In application.yml
|
|
vectorization:
|
|
enabled: true
|
|
```
|
|
|
|
### Step 4: Increase connection pool (if needed)
|
|
```yaml
|
|
hikari:
|
|
maximum-pool-size: 10 # Increase gradually
|
|
```
|
|
|
|
## Monitoring Commands
|
|
|
|
### Check running packages
|
|
```sql
|
|
SELECT package_identifier, download_status, updated_at
|
|
FROM ted.ted_daily_package
|
|
WHERE download_status IN ('DOWNLOADING', 'PROCESSING')
|
|
ORDER BY updated_at DESC;
|
|
```
|
|
|
|
### Check memory usage
|
|
```bash
|
|
jcmd <PID> GC.heap_info
|
|
```
|
|
|
|
### Check thread count
|
|
```bash
|
|
jcmd <PID> Thread.print | grep "ted-" | wc -l
|
|
```
|
|
|
|
## Notes
|
|
- **Processing is slower** but stable
|
|
- Approx. 50-100 documents/minute (sequential)
|
|
- Can process ~100,000 documents/day
|
|
- Vectorization can be run as separate batch job later
|