You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.6 KiB

Raw Blame History

Vector-sync HTTP embedding provider

This provider supports two endpoints:

POST {baseUrl}/vector-sync for single-text requests
POST {baseUrl}/vectorize-batch for batch document requests

Single request

Request body:

{
  "model": "intfloat/multilingual-e5-large",
  "text": "This is a sample text to vectorize"
}

Batch request

Request body:

{
  "model": "intfloat/multilingual-e5-large",
  "truncate_text": false,
  "truncate_length": 512,
  "chunk_size": 20,
  "items": [
    {
      "id": "2f48fd5c-9d39-4d80-9225-ea0c59c77c9a",
      "text": "This is a sample text to vectorize"
    }
  ]
}

Provider configuration

batch-request:
  truncate-text: false
  truncate-length: 512
  chunk-size: 20

These values are used for /vectorize-batch calls and can also be overridden per request via EmbeddingRequest.providerOptions().

Orchestrator batch processing

To let RepresentationEmbeddingOrchestrator send multiple representations in one provider call, enable batch processing for jobs and for the model:

dip:
  embedding:
    jobs:
      enabled: true
      parallel-batch-count: 1
      process-in-batches: true
      execution-batch-size: 20

    models:
      e5-default:
        supports-batch: true

Notes:

jobs are grouped by modelKey
non-batch-capable models still fall back to single-item execution
parallel-batch-count controls how many claimed job batches may be started in parallel
execution-batch-size controls how many texts are sent in one /vectorize-batch request inside each claimed job batch

1.6 KiB Raw Blame History