You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
949 B
949 B
Phase 2 - Vectorization decoupling
Phase 2 moves the primary vectorization pipeline from TED.procurement_document to the generic DOC
representation and embedding model introduced in Phase 1.
Implemented in this phase:
DOC.doc_text_representationis now the primary text source for embeddingsDOC.doc_embeddingis the primary persistence target for embedding lifecycle and vectors- a generic Camel route processes pending/failed embeddings asynchronously
- TED imports dual-write into the generic model by creating:
- canonical
DOC.doc_document - original
DOC.doc_content - primary
DOC.doc_text_representation - pending
DOC.doc_embedding
- canonical
- compatibility mode keeps writing completed TED embeddings back into
TED.procurement_document.content_vectorso the legacy semantic search continues to work
This phase is intentionally additive and does not yet migrate TED semantic search reads away from the legacy table.