DIP/docs/architecture/PHASE0_ARCHITECTURE_FOUNDAT...

77 lines
2.9 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 0 Architecture Foundation
## New project identity
- **Project name:** Procon Document Intelligence Platform
- **Short name:** DIP
- **Base namespace:** `at.procon.dip`
- **Legacy namespace kept during transition:** `at.procon.ted`
## Why this naming
The application is no longer only a TED notice processor. The new name reflects the broader goal:
import arbitrary document types, derive canonical searchable text, vectorize it, and run semantic
search over those representations.
## Phase 0 decisions implemented in code
1. New Spring Boot entry point under `at.procon.dip`
2. Legacy TED runtime kept through explicit package scanning
3. Generic vocabulary introduced via enums in `at.procon.dip.domain.document`
4. Tenant introduced as a first-class value object in `at.procon.dip.domain.tenant`
5. Ownership and access are explicitly separated through `DocumentAccessContext`
6. Canonical document metadata and ingestion descriptors support both:
- tenant-owned documents
- public documents without tenant ownership
7. Extension-point interfaces introduced for ingestion, classification, extraction,
normalization, and vectorization
8. Target schema split documented as:
- `DOC` for generic document model
- `TED` for TED-specific projections
9. Migration strategy formalized as phased additive migration:
- additive schema
- dual write
- backfill
- cutover
- retire legacy
## Planned package areas
- `at.procon.dip.architecture`
- `at.procon.dip.domain.access`
- `at.procon.dip.domain.document`
- `at.procon.dip.domain.tenant`
- `at.procon.dip.ingestion.spi`
- `at.procon.dip.classification.spi`
- `at.procon.dip.extraction.spi`
- `at.procon.dip.normalization.spi`
- `at.procon.dip.vectorization.spi`
- `at.procon.dip.search.spi`
- `at.procon.dip.processing.spi`
- `at.procon.dip.migration`
## Ownership and visibility decision
A tenant represents the owner of a document, but ownership is optional.
A public TED notice therefore does not need a fake tenant. Instead, the canonical model uses:
- optional `ownerTenant`
- mandatory `DocumentVisibility`
Examples:
- TED notice: `ownerTenant = null`, `visibility = PUBLIC`
- customer-private document: `ownerTenant = tenantA`, `visibility = TENANT`
- explicitly shared document: `ownerTenant = tenantA`, `visibility = SHARED`
Phase 1 now realizes this persistence direction through the additive `DOC` schema. The resulting
backbone uses:
- `DOC.doc_document.owner_tenant_id` nullable
- `DOC.doc_document.visibility` not null
The complete Phase 1 persistence details are documented in `docs/architecture/PHASE1_GENERIC_PERSISTENCE_MODEL.md`.
## Non-goals of Phase 0
- No database schema migration yet
- No runtime behavior changes in TED processing
- No replacement of `ProcurementDocument` yet
- No semantic search refactoring yet
## Result
The codebase now has a stable generalized namespace and contract surface for future phases without
requiring a disruptive rewrite.