You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
47 lines
1.5 KiB
Markdown
47 lines
1.5 KiB
Markdown
# Phase 3 - TED projection model
|
|
|
|
## Goal
|
|
|
|
Move TED from being the implicit root data model to being a typed projection on top of the generic
|
|
canonical document model.
|
|
|
|
## New persistence model
|
|
|
|
### Generic root
|
|
- `DOC.doc_document`
|
|
- `DOC.doc_content`
|
|
- `DOC.doc_text_representation`
|
|
- `DOC.doc_embedding`
|
|
|
|
### TED-specific projection
|
|
- `TED.ted_notice_projection`
|
|
- `TED.ted_notice_lot`
|
|
- `TED.ted_notice_organization`
|
|
|
|
## Relationship model
|
|
|
|
- one generic `DOC.doc_document`
|
|
- zero or one `TED.ted_notice_projection`
|
|
- zero to many `TED.ted_notice_lot`
|
|
- zero to many `TED.ted_notice_organization`
|
|
|
|
The projection also keeps an optional back-reference to the legacy `TED.procurement_document` row to
|
|
support incremental migration and validation.
|
|
|
|
## Runtime behavior
|
|
|
|
When a new TED XML document is imported:
|
|
1. it is parsed into the existing legacy `ProcurementDocument`
|
|
2. the generic DOC root is ensured/refreshed
|
|
3. the primary text representation is ensured
|
|
4. if the generic vectorization pipeline is enabled, a pending embedding is ensured
|
|
5. the TED structured projection tables are refreshed from the parsed legacy document
|
|
|
|
## Why this phase matters
|
|
|
|
This is the first phase where TED is explicitly modeled as a document type projection instead of the
|
|
platform's canonical root entity. That makes the next steps possible:
|
|
- generic semantic search across multiple document types
|
|
- future non-TED projections
|
|
- migration of TED structured search to the new projection tables
|