DIP/docs/architecture/PHASE0_ARCHITECTURE_FOUNDAT...

# Phase 0 – Architecture Foundation

## New project identity
- **Project name:** Procon Document Intelligence Platform
- **Short name:** DIP
- **Base namespace:** `at.procon.dip`
- **Legacy namespace kept during transition:** `at.procon.ted`

## Why this naming
The application is no longer only a TED notice processor. The new name reflects the broader goal:
import arbitrary document types, derive canonical searchable text, vectorize it, and run semantic
search over those representations.

## Phase 0 decisions implemented in code
1. New Spring Boot entry point under `at.procon.dip`
2. Legacy TED runtime kept through explicit package scanning
3. Generic vocabulary introduced via enums in `at.procon.dip.domain.document`
4. Tenant introduced as a first-class value object in `at.procon.dip.domain.tenant`
5. Ownership and access are explicitly separated through `DocumentAccessContext`
6. Canonical document metadata and ingestion descriptors support both:
   - tenant-owned documents
   - public documents without tenant ownership
7. Extension-point interfaces introduced for ingestion, classification, extraction,
   normalization, and vectorization
8. Target schema split documented as:
   - `DOC` for generic document model
   - `TED` for TED-specific projections
9. Migration strategy formalized as phased additive migration:
   - additive schema
   - dual write
   - backfill
   - cutover
   - retire legacy

## Planned package areas
- `at.procon.dip.architecture`
- `at.procon.dip.domain.access`
- `at.procon.dip.domain.document`
- `at.procon.dip.domain.tenant`
- `at.procon.dip.ingestion.spi`
- `at.procon.dip.classification.spi`
- `at.procon.dip.extraction.spi`
- `at.procon.dip.normalization.spi`
- `at.procon.dip.vectorization.spi`
- `at.procon.dip.search.spi`
- `at.procon.dip.processing.spi`
- `at.procon.dip.migration`

## Ownership and visibility decision
A tenant represents the owner of a document, but ownership is optional.

A public TED notice therefore does not need a fake tenant. Instead, the canonical model uses:
- optional `ownerTenant`
- mandatory `DocumentVisibility`

Examples:
- TED notice: `ownerTenant = null`, `visibility = PUBLIC`
- customer-private document: `ownerTenant = tenantA`, `visibility = TENANT`
- explicitly shared document: `ownerTenant = tenantA`, `visibility = SHARED`

Phase 1 now realizes this persistence direction through the additive `DOC` schema. The resulting
backbone uses:
- `DOC.doc_document.owner_tenant_id` nullable
- `DOC.doc_document.visibility` not null

The complete Phase 1 persistence details are documented in `docs/architecture/PHASE1_GENERIC_PERSISTENCE_MODEL.md`.

## Non-goals of Phase 0
- No database schema migration yet
- No runtime behavior changes in TED processing
- No replacement of `ProcurementDocument` yet
- No semantic search refactoring yet

## Result
The codebase now has a stable generalized namespace and contract surface for future phases without
requiring a disruptive rewrite.