You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
19 lines
949 B
Markdown
19 lines
949 B
Markdown
# Phase 2 - Vectorization decoupling
|
|
|
|
Phase 2 moves the primary vectorization pipeline from `TED.procurement_document` to the generic `DOC`
|
|
representation and embedding model introduced in Phase 1.
|
|
|
|
Implemented in this phase:
|
|
- `DOC.doc_text_representation` is now the primary text source for embeddings
|
|
- `DOC.doc_embedding` is the primary persistence target for embedding lifecycle and vectors
|
|
- a generic Camel route processes pending/failed embeddings asynchronously
|
|
- TED imports dual-write into the generic model by creating:
|
|
- canonical `DOC.doc_document`
|
|
- original `DOC.doc_content`
|
|
- primary `DOC.doc_text_representation`
|
|
- pending `DOC.doc_embedding`
|
|
- compatibility mode keeps writing completed TED embeddings back into
|
|
`TED.procurement_document.content_vector` so the legacy semantic search continues to work
|
|
|
|
This phase is intentionally additive and does not yet migrate TED semantic search reads away from the legacy table.
|