You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

949 B

Phase 2 - Vectorization decoupling

Phase 2 moves the primary vectorization pipeline from TED.procurement_document to the generic DOC representation and embedding model introduced in Phase 1.

Implemented in this phase:

  • DOC.doc_text_representation is now the primary text source for embeddings
  • DOC.doc_embedding is the primary persistence target for embedding lifecycle and vectors
  • a generic Camel route processes pending/failed embeddings asynchronously
  • TED imports dual-write into the generic model by creating:
    • canonical DOC.doc_document
    • original DOC.doc_content
    • primary DOC.doc_text_representation
    • pending DOC.doc_embedding
  • compatibility mode keeps writing completed TED embeddings back into TED.procurement_document.content_vector so the legacy semantic search continues to work

This phase is intentionally additive and does not yet migrate TED semantic search reads away from the legacy table.