DIP/docs/embedding/VECTOR_SYNC_HTTP_PROVIDER.md

2.5 KiB

Vector-sync HTTP embedding provider

This provider supports two endpoints:

  • POST {baseUrl}/vector-sync for single-text requests
  • POST {baseUrl}/vectorize-batch for batch document requests

Single request

Request body:

{
  "model": "intfloat/multilingual-e5-large",
  "text": "This is a sample text to vectorize"
}

Batch request

Request body:

{
  "model": "intfloat/multilingual-e5-large",
  "truncate_text": false,
  "truncate_length": 512,
  "chunk_size": 20,
  "items": [
    {
      "id": "2f48fd5c-9d39-4d80-9225-ea0c59c77c9a",
      "text": "This is a sample text to vectorize"
    }
  ]
}

Provider configuration

batch-request:
  truncate-text: false
  truncate-length: 512
  chunk-size: 20

These values are used for /vectorize-batch calls and can also be overridden per request via EmbeddingRequest.providerOptions().

Orchestrator batch processing

To let RepresentationEmbeddingOrchestrator send multiple representations in one provider call, enable batch processing for jobs and for the model:

dip:
  embedding:
    jobs:
      enabled: true
      parallel-batch-count: 1
      process-in-batches: true
      execution-batch-size: 20

    models:
      e5-default:
        supports-batch: true

Notes:

  • jobs are grouped by modelKey
  • non-batch-capable models still fall back to single-item execution
  • parallel-batch-count controls how many claimed job batches may be started in parallel
  • execution-batch-size controls how many texts are sent in one /vectorize-batch request inside each claimed job batch

E5 prefix handling

For models such as intfloat/multilingual-e5-large, configure prefix handling on the model:

dip:
  embedding:
    models:
      e5-default:
        provider-config-key: vector-sync-e5
        provider-model-key: intfloat/multilingual-e5-large
        dimensions: 1024
        supports-batch: true
        prefix-mode: CLIENT
        query-prefix: "query: "
        document-prefix: "passage: "

Supported modes:

  • OFF - DIP sends raw text
  • CLIENT - DIP prepends the configured prefix before calling the provider
  • EXTERNAL - DIP assumes the external service applies the prefixing itself

For persisted document embeddings, the produced prefix provenance is stored in doc.doc_embedding:

  • prefix_mode
  • applied_prefix

This makes it possible to identify whether indexed vectors were created with raw text, DIP-side prefixing, or externally handled prefixing before deciding on re-embedding.