# Phase 2 - Representation-based vectorization and dual-write compatibility ## Goal Decouple vectorization from the TED document entity so arbitrary document types can use a shared representation-to-embedding pipeline. ## Primary changes 1. **Primary vectorization source** - before: `TED.procurement_document.text_content` - now: `DOC.doc_text_representation.text_body` 2. **Primary vectorization target** - before: `TED.procurement_document.content_vector` - now: `DOC.doc_embedding.embedding_vector` 3. **Compatibility during migration** - completed embeddings are optionally mirrored back to the legacy TED vector columns using the shared TED document hash (`document_hash` / `dedup_hash`) 4. **TED dual-write bridge** - fresh TED documents are projected into the generic `DOC` model immediately after persistence ## Key services introduced - `TedPhase2GenericDocumentService` - creates/refreshes generic DOC records for TED notices - `DocumentEmbeddingProcessingService` - processes DOC embedding lifecycle records - `GenericVectorizationRoute` - scheduler + worker route for asynchronous DOC embedding generation - `ConfiguredEmbeddingModelStartupRunner` - ensures the configured embedding model exists in `DOC.doc_embedding_model` - `GenericVectorizationStartupRunner` - queues pending/failed DOC embeddings on startup ## Behavior when Phase 2 is enabled - legacy `VectorizationRoute` is disabled - legacy startup queueing is disabled - legacy event-based vectorization queueing is disabled - generic scheduler and startup runner handle DOC embeddings instead ## Compatibility intent This phase keeps the existing TED search endpoints working while the new generic indexing layer becomes operational. The next phase can migrate search reads from the TED table to `DOC.doc_embedding`.