You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/architecture/PHASE3_TED_PROJECTION_MODEL.md

47 lines
1.5 KiB
Markdown

# Phase 3 - TED projection model
## Goal
Move TED from being the implicit root data model to being a typed projection on top of the generic
canonical document model.
## New persistence model
### Generic root
- `DOC.doc_document`
- `DOC.doc_content`
- `DOC.doc_text_representation`
- `DOC.doc_embedding`
### TED-specific projection
- `TED.ted_notice_projection`
- `TED.ted_notice_lot`
- `TED.ted_notice_organization`
## Relationship model
- one generic `DOC.doc_document`
- zero or one `TED.ted_notice_projection`
- zero to many `TED.ted_notice_lot`
- zero to many `TED.ted_notice_organization`
The projection also keeps an optional back-reference to the legacy `TED.procurement_document` row to
support incremental migration and validation.
## Runtime behavior
When a new TED XML document is imported:
1. it is parsed into the existing legacy `ProcurementDocument`
2. the generic DOC root is ensured/refreshed
3. the primary text representation is ensured
4. if the generic vectorization pipeline is enabled, a pending embedding is ensured
5. the TED structured projection tables are refreshed from the parsed legacy document
## Why this phase matters
This is the first phase where TED is explicitly modeled as a document type projection instead of the
platform's canonical root entity. That makes the next steps possible:
- generic semantic search across multiple document types
- future non-TED projections
- migration of TED structured search to the new projection tables