# Phase 1 – Generic Persistence Model ## Goal Introduce the generalized persistence backbone in an additive, non-breaking way. ## New schema The project now contains the `DOC` schema with the following tables: - `doc_tenant` - `doc_document` - `doc_source` - `doc_content` - `doc_text_representation` - `doc_embedding_model` - `doc_embedding` - `doc_relation` ## Design choices ### Owner tenant is optional Public TED notices can remain unowned documents with `visibility = PUBLIC`. ### Visibility is mandatory Every canonical document must carry `DocumentVisibility`. ### Vectorization is separated already `doc_embedding` holds vectorization lifecycle and model association outside `doc_document`. The actual vector payload column exists in the schema, but the runtime still uses the legacy TED vectorization flow until Phase 2. ### Content and text representation are separate `doc_content` stores payload variants. `doc_text_representation` stores search-oriented texts. This is the key boundary needed for arbitrary future document types. ## What is still intentionally missing - no dual-write from TED import yet - no generic ingestion routes yet - no semantic search cutover yet - no TED projection tables yet - no historical migration yet ## Result The generalized platform is now backed by a real schema and service layer, which reduces the later migration risk significantly.