DIP/docs/architecture/PHASE1_GENERIC_PERSISTENCE_...

# Phase 1 – Generic Persistence Model

## Goal
Introduce the generalized persistence backbone in an additive, non-breaking way.

## New schema
The project now contains the `DOC` schema with the following tables:
- `doc_tenant`
- `doc_document`
- `doc_source`
- `doc_content`
- `doc_text_representation`
- `doc_embedding_model`
- `doc_embedding`
- `doc_relation`

## Design choices
### Owner tenant is optional
Public TED notices can remain unowned documents with `visibility = PUBLIC`.

### Visibility is mandatory
Every canonical document must carry `DocumentVisibility`.

### Vectorization is separated already
`doc_embedding` holds vectorization lifecycle and model association outside `doc_document`.
The actual vector payload column exists in the schema, but the runtime still uses the legacy TED
vectorization flow until Phase 2.

### Content and text representation are separate
`doc_content` stores payload variants. `doc_text_representation` stores search-oriented texts.
This is the key boundary needed for arbitrary future document types.

## What is still intentionally missing
- no dual-write from TED import yet
- no generic ingestion routes yet
- no semantic search cutover yet
- no TED projection tables yet
- no historical migration yet

## Result
The generalized platform is now backed by a real schema and service layer, which reduces the later
migration risk significantly.