DIP/docs/architecture/PHASE3_TED_PROJECTION_MODEL.md

# Phase 3 - TED projection model

## Goal

Move TED from being the implicit root data model to being a typed projection on top of the generic
canonical document model.

## New persistence model

### Generic root
- `DOC.doc_document`
- `DOC.doc_content`
- `DOC.doc_text_representation`
- `DOC.doc_embedding`

### TED-specific projection
- `TED.ted_notice_projection`
- `TED.ted_notice_lot`
- `TED.ted_notice_organization`

## Relationship model

- one generic `DOC.doc_document`
- zero or one `TED.ted_notice_projection`
- zero to many `TED.ted_notice_lot`
- zero to many `TED.ted_notice_organization`

The projection also keeps an optional back-reference to the legacy `TED.procurement_document` row to
support incremental migration and validation.

## Runtime behavior

When a new TED XML document is imported:
1. it is parsed into the existing legacy `ProcurementDocument`
2. the generic DOC root is ensured/refreshed
3. the primary text representation is ensured
4. if the generic vectorization pipeline is enabled, a pending embedding is ensured
5. the TED structured projection tables are refreshed from the parsed legacy document

## Why this phase matters

This is the first phase where TED is explicitly modeled as a document type projection instead of the
platform's canonical root entity. That makes the next steps possible:
- generic semantic search across multiple document types
- future non-TED projections
- migration of TED structured search to the new projection tables