From 87fdae9f21e56b78ddda39a1f7955ab19a44c731 Mon Sep 17 00:00:00 2001 From: trifonovt <87468028+TihomirTrifonov@users.noreply.github.com> Date: Fri, 20 Mar 2026 17:51:10 +0100 Subject: [PATCH] embedding nv2 --- docs/embedding/NV2_IMPLEMENTATION_NOTES.md | 39 ++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 docs/embedding/NV2_IMPLEMENTATION_NOTES.md diff --git a/docs/embedding/NV2_IMPLEMENTATION_NOTES.md b/docs/embedding/NV2_IMPLEMENTATION_NOTES.md new file mode 100644 index 0000000..57e2b76 --- /dev/null +++ b/docs/embedding/NV2_IMPLEMENTATION_NOTES.md @@ -0,0 +1,39 @@ +# NV2 - Embedding persistence and job orchestration + +This patch continues the new parallel `at.procon.dip.embedding.*` subsystem introduced in NV1. + +## Scope + +NV2 adds: + +- representation-driven selection policy +- `DOC.doc_embedding_job` queue table +- job lifecycle service with retry scheduling +- model-catalog sync into `DOC.doc_embedding_model` +- persistence of vectors into `DOC.doc_embedding` +- orchestrator for enqueueing and processing jobs +- unit tests for the new orchestration layer + +## Still intentionally missing + +- no cutover of the old vectorization route +- no scheduler / background polling by default +- no semantic search engine yet +- no migration / backfill yet + +## Intended usage + +New code can now do: + +1. enqueue a document or representation for embedding with a configured model key +2. process the pending jobs through the new provider-based subsystem +3. store the resulting vectors in the generic DOC embedding tables + +## Next step after NV2 + +NV3 should add: + +- `PgVectorSemanticSearchEngine` +- semantic repository +- query embedding integration into the generic search engine +- hybrid lexical + semantic fusion