You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/architecture/PHASE0_ARCHITECTURE_FOUNDAT...

77 lines
2.9 KiB
Markdown

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# Phase 0 Architecture Foundation
## New project identity
- **Project name:** Procon Document Intelligence Platform
- **Short name:** DIP
- **Base namespace:** `at.procon.dip`
- **Legacy namespace kept during transition:** `at.procon.ted`
## Why this naming
The application is no longer only a TED notice processor. The new name reflects the broader goal:
import arbitrary document types, derive canonical searchable text, vectorize it, and run semantic
search over those representations.
## Phase 0 decisions implemented in code
1. New Spring Boot entry point under `at.procon.dip`
2. Legacy TED runtime kept through explicit package scanning
3. Generic vocabulary introduced via enums in `at.procon.dip.domain.document`
4. Tenant introduced as a first-class value object in `at.procon.dip.domain.tenant`
5. Ownership and access are explicitly separated through `DocumentAccessContext`
6. Canonical document metadata and ingestion descriptors support both:
- tenant-owned documents
- public documents without tenant ownership
7. Extension-point interfaces introduced for ingestion, classification, extraction,
normalization, and vectorization
8. Target schema split documented as:
- `DOC` for generic document model
- `TED` for TED-specific projections
9. Migration strategy formalized as phased additive migration:
- additive schema
- dual write
- backfill
- cutover
- retire legacy
## Planned package areas
- `at.procon.dip.architecture`
- `at.procon.dip.domain.access`
- `at.procon.dip.domain.document`
- `at.procon.dip.domain.tenant`
- `at.procon.dip.ingestion.spi`
- `at.procon.dip.classification.spi`
- `at.procon.dip.extraction.spi`
- `at.procon.dip.normalization.spi`
- `at.procon.dip.vectorization.spi`
- `at.procon.dip.search.spi`
- `at.procon.dip.processing.spi`
- `at.procon.dip.migration`
## Ownership and visibility decision
A tenant represents the owner of a document, but ownership is optional.
A public TED notice therefore does not need a fake tenant. Instead, the canonical model uses:
- optional `ownerTenant`
- mandatory `DocumentVisibility`
Examples:
- TED notice: `ownerTenant = null`, `visibility = PUBLIC`
- customer-private document: `ownerTenant = tenantA`, `visibility = TENANT`
- explicitly shared document: `ownerTenant = tenantA`, `visibility = SHARED`
Phase 1 now realizes this persistence direction through the additive `DOC` schema. The resulting
backbone uses:
- `DOC.doc_document.owner_tenant_id` nullable
- `DOC.doc_document.visibility` not null
The complete Phase 1 persistence details are documented in `docs/architecture/PHASE1_GENERIC_PERSISTENCE_MODEL.md`.
## Non-goals of Phase 0
- No database schema migration yet
- No runtime behavior changes in TED processing
- No replacement of `ProcurementDocument` yet
- No semantic search refactoring yet
## Result
The codebase now has a stable generalized namespace and contract surface for future phases without
requiring a disruptive rewrite.