You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/architecture/PHASE3_TED_PROJECTION_MODEL.md

1.5 KiB

Phase 3 - TED projection model

Goal

Move TED from being the implicit root data model to being a typed projection on top of the generic canonical document model.

New persistence model

Generic root

  • DOC.doc_document
  • DOC.doc_content
  • DOC.doc_text_representation
  • DOC.doc_embedding

TED-specific projection

  • TED.ted_notice_projection
  • TED.ted_notice_lot
  • TED.ted_notice_organization

Relationship model

  • one generic DOC.doc_document
  • zero or one TED.ted_notice_projection
  • zero to many TED.ted_notice_lot
  • zero to many TED.ted_notice_organization

The projection also keeps an optional back-reference to the legacy TED.procurement_document row to support incremental migration and validation.

Runtime behavior

When a new TED XML document is imported:

  1. it is parsed into the existing legacy ProcurementDocument
  2. the generic DOC root is ensured/refreshed
  3. the primary text representation is ensured
  4. if the generic vectorization pipeline is enabled, a pending embedding is ensured
  5. the TED structured projection tables are refreshed from the parsed legacy document

Why this phase matters

This is the first phase where TED is explicitly modeled as a document type projection instead of the platform's canonical root entity. That makes the next steps possible:

  • generic semantic search across multiple document types
  • future non-TED projections
  • migration of TED structured search to the new projection tables