DIP/docs/embedding/VECTOR_SYNC_HTTP_PROVIDER.md

99 lines
2.5 KiB
Markdown

# Vector-sync HTTP embedding provider
This provider supports two endpoints:
- `POST {baseUrl}/vector-sync` for single-text requests
- `POST {baseUrl}/vectorize-batch` for batch document requests
## Single request
Request body:
```json
{
"model": "intfloat/multilingual-e5-large",
"text": "This is a sample text to vectorize"
}
```
## Batch request
Request body:
```json
{
"model": "intfloat/multilingual-e5-large",
"truncate_text": false,
"truncate_length": 512,
"chunk_size": 20,
"items": [
{
"id": "2f48fd5c-9d39-4d80-9225-ea0c59c77c9a",
"text": "This is a sample text to vectorize"
}
]
}
```
## Provider configuration
```yaml
batch-request:
truncate-text: false
truncate-length: 512
chunk-size: 20
```
These values are used for `/vectorize-batch` calls and can also be overridden per request via `EmbeddingRequest.providerOptions()`.
## Orchestrator batch processing
To let `RepresentationEmbeddingOrchestrator` send multiple representations in one provider call, enable batch processing for jobs and for the model:
```yaml
dip:
embedding:
jobs:
enabled: true
parallel-batch-count: 1
process-in-batches: true
execution-batch-size: 20
models:
e5-default:
supports-batch: true
```
Notes:
- jobs are grouped by `modelKey`
- non-batch-capable models still fall back to single-item execution
- `parallel-batch-count` controls how many claimed job batches may be started in parallel
- `execution-batch-size` controls how many texts are sent in one `/vectorize-batch` request inside each claimed job batch
## E5 prefix handling
For models such as `intfloat/multilingual-e5-large`, configure prefix handling on the model:
```yaml
dip:
embedding:
models:
e5-default:
provider-config-key: vector-sync-e5
provider-model-key: intfloat/multilingual-e5-large
dimensions: 1024
supports-batch: true
prefix-mode: CLIENT
query-prefix: "query: "
document-prefix: "passage: "
```
Supported modes:
- `OFF` - DIP sends raw text
- `CLIENT` - DIP prepends the configured prefix before calling the provider
- `EXTERNAL` - DIP assumes the external service applies the prefixing itself
For persisted document embeddings, the produced prefix provenance is stored in `doc.doc_embedding`:
- `prefix_profile_id` (resolved via `DOC.doc_embedding_prefix_profile`)
This makes it possible to identify whether indexed vectors were created with raw text, DIP-side prefixing, or externally handled prefixing before deciding on re-embedding.