batch embedding

master
trifonovt 2 weeks ago
parent 00ad3aad38
commit 152d9739af

@ -1,12 +1,11 @@
# Vector-sync HTTP embedding provider
This patch adds a new provider type:
This provider supports two endpoints:
- `http-vector-sync`
- `POST {baseUrl}/vector-sync` for single-text requests
- `POST {baseUrl}/vectorize-batch` for batch document requests
## Request
Endpoint:
- `POST {baseUrl}/vector-sync`
## Single request
Request body:
```json
@ -16,24 +15,53 @@ Request body:
}
```
## Response
## Batch request
Request body:
```json
{
"runtime_ms": 472.49,
"vector": [0.1, 0.2, 0.3],
"incomplete": false,
"combined_vector": null,
"token_count": 9,
"model": "intfloat/multilingual-e5-large",
"max_seq_length": 512
"truncate_text": false,
"truncate_length": 512,
"chunk_size": 20,
"items": [
{
"id": "2f48fd5c-9d39-4d80-9225-ea0c59c77c9a",
"text": "This is a sample text to vectorize"
}
]
}
```
## Notes
- supports a single text per request
- works for both document and query embeddings
- validates returned vector dimension against the configured embedding model
- keeps the existing `/embed` provider in place as `http-json`
## Provider configuration
```yaml
batch-request:
truncate-text: false
truncate-length: 512
chunk-size: 20
```
These values are used for `/vectorize-batch` calls and can also be overridden per request via `EmbeddingRequest.providerOptions()`.
## Orchestrator batch processing
To let `RepresentationEmbeddingOrchestrator` send multiple representations in one provider call, enable batch processing for jobs and for the model:
```yaml
dip:
embedding:
jobs:
enabled: true
process-in-batches: true
execution-batch-size: 20
models:
e5-default:
supports-batch: true
```
## Example config
See `application-new-example-vector-sync-provider.yml`.
Notes:
- jobs are grouped by `modelKey`
- non-batch-capable models still fall back to single-item execution
- `execution-batch-size` controls how many texts are sent in one `/vectorize-batch` request

Loading…
Cancel
Save