43 lines
1.4 KiB
Markdown
43 lines
1.4 KiB
Markdown
# Phase 1 – Generic Persistence Model
|
||
|
||
## Goal
|
||
Introduce the generalized persistence backbone in an additive, non-breaking way.
|
||
|
||
## New schema
|
||
The project now contains the `DOC` schema with the following tables:
|
||
- `doc_tenant`
|
||
- `doc_document`
|
||
- `doc_source`
|
||
- `doc_content`
|
||
- `doc_text_representation`
|
||
- `doc_embedding_model`
|
||
- `doc_embedding`
|
||
- `doc_relation`
|
||
|
||
## Design choices
|
||
### Owner tenant is optional
|
||
Public TED notices can remain unowned documents with `visibility = PUBLIC`.
|
||
|
||
### Visibility is mandatory
|
||
Every canonical document must carry `DocumentVisibility`.
|
||
|
||
### Vectorization is separated already
|
||
`doc_embedding` holds vectorization lifecycle and model association outside `doc_document`.
|
||
The actual vector payload column exists in the schema, but the runtime still uses the legacy TED
|
||
vectorization flow until Phase 2.
|
||
|
||
### Content and text representation are separate
|
||
`doc_content` stores payload variants. `doc_text_representation` stores search-oriented texts.
|
||
This is the key boundary needed for arbitrary future document types.
|
||
|
||
## What is still intentionally missing
|
||
- no dual-write from TED import yet
|
||
- no generic ingestion routes yet
|
||
- no semantic search cutover yet
|
||
- no TED projection tables yet
|
||
- no historical migration yet
|
||
|
||
## Result
|
||
The generalized platform is now backed by a real schema and service layer, which reduces the later
|
||
migration risk significantly.
|