You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/architecture/PHASE1_GENERIC_PERSISTENCE_...

1.4 KiB

Phase 1 Generic Persistence Model

Goal

Introduce the generalized persistence backbone in an additive, non-breaking way.

New schema

The project now contains the DOC schema with the following tables:

  • doc_tenant
  • doc_document
  • doc_source
  • doc_content
  • doc_text_representation
  • doc_embedding_model
  • doc_embedding
  • doc_relation

Design choices

Owner tenant is optional

Public TED notices can remain unowned documents with visibility = PUBLIC.

Visibility is mandatory

Every canonical document must carry DocumentVisibility.

Vectorization is separated already

doc_embedding holds vectorization lifecycle and model association outside doc_document. The actual vector payload column exists in the schema, but the runtime still uses the legacy TED vectorization flow until Phase 2.

Content and text representation are separate

doc_content stores payload variants. doc_text_representation stores search-oriented texts. This is the key boundary needed for arbitrary future document types.

What is still intentionally missing

  • no dual-write from TED import yet
  • no generic ingestion routes yet
  • no semantic search cutover yet
  • no TED projection tables yet
  • no historical migration yet

Result

The generalized platform is now backed by a real schema and service layer, which reduces the later migration risk significantly.