77 lines
2.9 KiB
Markdown
77 lines
2.9 KiB
Markdown
# Phase 0 – Architecture Foundation
|
||
|
||
## New project identity
|
||
- **Project name:** Procon Document Intelligence Platform
|
||
- **Short name:** DIP
|
||
- **Base namespace:** `at.procon.dip`
|
||
- **Legacy namespace kept during transition:** `at.procon.ted`
|
||
|
||
## Why this naming
|
||
The application is no longer only a TED notice processor. The new name reflects the broader goal:
|
||
import arbitrary document types, derive canonical searchable text, vectorize it, and run semantic
|
||
search over those representations.
|
||
|
||
## Phase 0 decisions implemented in code
|
||
1. New Spring Boot entry point under `at.procon.dip`
|
||
2. Legacy TED runtime kept through explicit package scanning
|
||
3. Generic vocabulary introduced via enums in `at.procon.dip.domain.document`
|
||
4. Tenant introduced as a first-class value object in `at.procon.dip.domain.tenant`
|
||
5. Ownership and access are explicitly separated through `DocumentAccessContext`
|
||
6. Canonical document metadata and ingestion descriptors support both:
|
||
- tenant-owned documents
|
||
- public documents without tenant ownership
|
||
7. Extension-point interfaces introduced for ingestion, classification, extraction,
|
||
normalization, and vectorization
|
||
8. Target schema split documented as:
|
||
- `DOC` for generic document model
|
||
- `TED` for TED-specific projections
|
||
9. Migration strategy formalized as phased additive migration:
|
||
- additive schema
|
||
- dual write
|
||
- backfill
|
||
- cutover
|
||
- retire legacy
|
||
|
||
## Planned package areas
|
||
- `at.procon.dip.architecture`
|
||
- `at.procon.dip.domain.access`
|
||
- `at.procon.dip.domain.document`
|
||
- `at.procon.dip.domain.tenant`
|
||
- `at.procon.dip.ingestion.spi`
|
||
- `at.procon.dip.classification.spi`
|
||
- `at.procon.dip.extraction.spi`
|
||
- `at.procon.dip.normalization.spi`
|
||
- `at.procon.dip.vectorization.spi`
|
||
- `at.procon.dip.search.spi`
|
||
- `at.procon.dip.processing.spi`
|
||
- `at.procon.dip.migration`
|
||
|
||
## Ownership and visibility decision
|
||
A tenant represents the owner of a document, but ownership is optional.
|
||
|
||
A public TED notice therefore does not need a fake tenant. Instead, the canonical model uses:
|
||
- optional `ownerTenant`
|
||
- mandatory `DocumentVisibility`
|
||
|
||
Examples:
|
||
- TED notice: `ownerTenant = null`, `visibility = PUBLIC`
|
||
- customer-private document: `ownerTenant = tenantA`, `visibility = TENANT`
|
||
- explicitly shared document: `ownerTenant = tenantA`, `visibility = SHARED`
|
||
|
||
Phase 1 now realizes this persistence direction through the additive `DOC` schema. The resulting
|
||
backbone uses:
|
||
- `DOC.doc_document.owner_tenant_id` nullable
|
||
- `DOC.doc_document.visibility` not null
|
||
|
||
The complete Phase 1 persistence details are documented in `docs/architecture/PHASE1_GENERIC_PERSISTENCE_MODEL.md`.
|
||
|
||
## Non-goals of Phase 0
|
||
- No database schema migration yet
|
||
- No runtime behavior changes in TED processing
|
||
- No replacement of `ProcurementDocument` yet
|
||
- No semantic search refactoring yet
|
||
|
||
## Result
|
||
The codebase now has a stable generalized namespace and contract surface for future phases without
|
||
requiring a disruptive rewrite.
|