1.4 KiB
Phase 1 – Generic Persistence Model
Goal
Introduce the generalized persistence backbone in an additive, non-breaking way.
New schema
The project now contains the DOC schema with the following tables:
doc_tenantdoc_documentdoc_sourcedoc_contentdoc_text_representationdoc_embedding_modeldoc_embeddingdoc_relation
Design choices
Owner tenant is optional
Public TED notices can remain unowned documents with visibility = PUBLIC.
Visibility is mandatory
Every canonical document must carry DocumentVisibility.
Vectorization is separated already
doc_embedding holds vectorization lifecycle and model association outside doc_document.
The actual vector payload column exists in the schema, but the runtime still uses the legacy TED
vectorization flow until Phase 2.
Content and text representation are separate
doc_content stores payload variants. doc_text_representation stores search-oriented texts.
This is the key boundary needed for arbitrary future document types.
What is still intentionally missing
- no dual-write from TED import yet
- no generic ingestion routes yet
- no semantic search cutover yet
- no TED projection tables yet
- no historical migration yet
Result
The generalized platform is now backed by a real schema and service layer, which reduces the later migration risk significantly.