You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.4 KiB

Raw Blame History Unescape Escape

Phase 1 – Generic Persistence Model

Goal

Introduce the generalized persistence backbone in an additive, non-breaking way.

New schema

The project now contains the DOC schema with the following tables:

doc_tenant
doc_document
doc_source
doc_content
doc_text_representation
doc_embedding_model
doc_embedding
doc_relation

Design choices

Owner tenant is optional

Public TED notices can remain unowned documents with visibility = PUBLIC.

Visibility is mandatory

Every canonical document must carry DocumentVisibility.

Vectorization is separated already

doc_embedding holds vectorization lifecycle and model association outside doc_document. The actual vector payload column exists in the schema, but the runtime still uses the legacy TED vectorization flow until Phase 2.

Content and text representation are separate

doc_content stores payload variants. doc_text_representation stores search-oriented texts. This is the key boundary needed for arbitrary future document types.

What is still intentionally missing

no dual-write from TED import yet
no generic ingestion routes yet
no semantic search cutover yet
no TED projection tables yet
no historical migration yet

Result

The generalized platform is now backed by a real schema and service layer, which reduces the later migration risk significantly.

1.4 KiB Raw Blame History Unescape Escape