|
|
# Phase 0 – Architecture Foundation
|
|
|
|
|
|
## New project identity
|
|
|
- **Project name:** Procon Document Intelligence Platform
|
|
|
- **Short name:** DIP
|
|
|
- **Base namespace:** `at.procon.dip`
|
|
|
- **Legacy namespace kept during transition:** `at.procon.ted`
|
|
|
|
|
|
## Why this naming
|
|
|
The application is no longer only a TED notice processor. The new name reflects the broader goal:
|
|
|
import arbitrary document types, derive canonical searchable text, vectorize it, and run semantic
|
|
|
search over those representations.
|
|
|
|
|
|
## Phase 0 decisions implemented in code
|
|
|
1. New Spring Boot entry point under `at.procon.dip`
|
|
|
2. Legacy TED runtime kept through explicit package scanning
|
|
|
3. Generic vocabulary introduced via enums in `at.procon.dip.domain.document`
|
|
|
4. Tenant introduced as a first-class value object in `at.procon.dip.domain.tenant`
|
|
|
5. Ownership and access are explicitly separated through `DocumentAccessContext`
|
|
|
6. Canonical document metadata and ingestion descriptors support both:
|
|
|
- tenant-owned documents
|
|
|
- public documents without tenant ownership
|
|
|
7. Extension-point interfaces introduced for ingestion, classification, extraction,
|
|
|
normalization, and vectorization
|
|
|
8. Target schema split documented as:
|
|
|
- `DOC` for generic document model
|
|
|
- `TED` for TED-specific projections
|
|
|
9. Migration strategy formalized as phased additive migration:
|
|
|
- additive schema
|
|
|
- dual write
|
|
|
- backfill
|
|
|
- cutover
|
|
|
- retire legacy
|
|
|
|
|
|
## Planned package areas
|
|
|
- `at.procon.dip.architecture`
|
|
|
- `at.procon.dip.domain.access`
|
|
|
- `at.procon.dip.domain.document`
|
|
|
- `at.procon.dip.domain.tenant`
|
|
|
- `at.procon.dip.ingestion.spi`
|
|
|
- `at.procon.dip.classification.spi`
|
|
|
- `at.procon.dip.extraction.spi`
|
|
|
- `at.procon.dip.normalization.spi`
|
|
|
- `at.procon.dip.vectorization.spi`
|
|
|
- `at.procon.dip.search.spi`
|
|
|
- `at.procon.dip.processing.spi`
|
|
|
- `at.procon.dip.migration`
|
|
|
|
|
|
## Ownership and visibility decision
|
|
|
A tenant represents the owner of a document, but ownership is optional.
|
|
|
|
|
|
A public TED notice therefore does not need a fake tenant. Instead, the canonical model uses:
|
|
|
- optional `ownerTenant`
|
|
|
- mandatory `DocumentVisibility`
|
|
|
|
|
|
Examples:
|
|
|
- TED notice: `ownerTenant = null`, `visibility = PUBLIC`
|
|
|
- customer-private document: `ownerTenant = tenantA`, `visibility = TENANT`
|
|
|
- explicitly shared document: `ownerTenant = tenantA`, `visibility = SHARED`
|
|
|
|
|
|
Phase 1 now realizes this persistence direction through the additive `DOC` schema. The resulting
|
|
|
backbone uses:
|
|
|
- `DOC.doc_document.owner_tenant_id` nullable
|
|
|
- `DOC.doc_document.visibility` not null
|
|
|
|
|
|
The complete Phase 1 persistence details are documented in `docs/architecture/PHASE1_GENERIC_PERSISTENCE_MODEL.md`.
|
|
|
|
|
|
## Non-goals of Phase 0
|
|
|
- No database schema migration yet
|
|
|
- No runtime behavior changes in TED processing
|
|
|
- No replacement of `ProcurementDocument` yet
|
|
|
- No semantic search refactoring yet
|
|
|
|
|
|
## Result
|
|
|
The codebase now has a stable generalized namespace and contract surface for future phases without
|
|
|
requiring a disruptive rewrite.
|