You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

19 lines
949 B
Markdown

# Phase 2 - Vectorization decoupling
Phase 2 moves the primary vectorization pipeline from `TED.procurement_document` to the generic `DOC`
representation and embedding model introduced in Phase 1.
Implemented in this phase:
- `DOC.doc_text_representation` is now the primary text source for embeddings
- `DOC.doc_embedding` is the primary persistence target for embedding lifecycle and vectors
- a generic Camel route processes pending/failed embeddings asynchronously
- TED imports dual-write into the generic model by creating:
- canonical `DOC.doc_document`
- original `DOC.doc_content`
- primary `DOC.doc_text_representation`
- pending `DOC.doc_embedding`
- compatibility mode keeps writing completed TED embeddings back into
`TED.procurement_document.content_vector` so the legacy semantic search continues to work
This phase is intentionally additive and does not yet migrate TED semantic search reads away from the legacy table.