DIP/docs/architecture/PHASE1_GENERIC_PERSISTENCE_...

43 lines
1.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 1 Generic Persistence Model
## Goal
Introduce the generalized persistence backbone in an additive, non-breaking way.
## New schema
The project now contains the `DOC` schema with the following tables:
- `doc_tenant`
- `doc_document`
- `doc_source`
- `doc_content`
- `doc_text_representation`
- `doc_embedding_model`
- `doc_embedding`
- `doc_relation`
## Design choices
### Owner tenant is optional
Public TED notices can remain unowned documents with `visibility = PUBLIC`.
### Visibility is mandatory
Every canonical document must carry `DocumentVisibility`.
### Vectorization is separated already
`doc_embedding` holds vectorization lifecycle and model association outside `doc_document`.
The actual vector payload column exists in the schema, but the runtime still uses the legacy TED
vectorization flow until Phase 2.
### Content and text representation are separate
`doc_content` stores payload variants. `doc_text_representation` stores search-oriented texts.
This is the key boundary needed for arbitrary future document types.
## What is still intentionally missing
- no dual-write from TED import yet
- no generic ingestion routes yet
- no semantic search cutover yet
- no TED projection tables yet
- no historical migration yet
## Result
The generalized platform is now backed by a real schema and service layer, which reduces the later
migration risk significantly.