You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/architecture/PHASE0_ARCHITECTURE_FOUNDAT...

2.9 KiB

Phase 0 Architecture Foundation

New project identity

  • Project name: Procon Document Intelligence Platform
  • Short name: DIP
  • Base namespace: at.procon.dip
  • Legacy namespace kept during transition: at.procon.ted

Why this naming

The application is no longer only a TED notice processor. The new name reflects the broader goal: import arbitrary document types, derive canonical searchable text, vectorize it, and run semantic search over those representations.

Phase 0 decisions implemented in code

  1. New Spring Boot entry point under at.procon.dip
  2. Legacy TED runtime kept through explicit package scanning
  3. Generic vocabulary introduced via enums in at.procon.dip.domain.document
  4. Tenant introduced as a first-class value object in at.procon.dip.domain.tenant
  5. Ownership and access are explicitly separated through DocumentAccessContext
  6. Canonical document metadata and ingestion descriptors support both:
    • tenant-owned documents
    • public documents without tenant ownership
  7. Extension-point interfaces introduced for ingestion, classification, extraction, normalization, and vectorization
  8. Target schema split documented as:
    • DOC for generic document model
    • TED for TED-specific projections
  9. Migration strategy formalized as phased additive migration:
    • additive schema
    • dual write
    • backfill
    • cutover
    • retire legacy

Planned package areas

  • at.procon.dip.architecture
  • at.procon.dip.domain.access
  • at.procon.dip.domain.document
  • at.procon.dip.domain.tenant
  • at.procon.dip.ingestion.spi
  • at.procon.dip.classification.spi
  • at.procon.dip.extraction.spi
  • at.procon.dip.normalization.spi
  • at.procon.dip.vectorization.spi
  • at.procon.dip.search.spi
  • at.procon.dip.processing.spi
  • at.procon.dip.migration

Ownership and visibility decision

A tenant represents the owner of a document, but ownership is optional.

A public TED notice therefore does not need a fake tenant. Instead, the canonical model uses:

  • optional ownerTenant
  • mandatory DocumentVisibility

Examples:

  • TED notice: ownerTenant = null, visibility = PUBLIC
  • customer-private document: ownerTenant = tenantA, visibility = TENANT
  • explicitly shared document: ownerTenant = tenantA, visibility = SHARED

Phase 1 now realizes this persistence direction through the additive DOC schema. The resulting backbone uses:

  • DOC.doc_document.owner_tenant_id nullable
  • DOC.doc_document.visibility not null

The complete Phase 1 persistence details are documented in docs/architecture/PHASE1_GENERIC_PERSISTENCE_MODEL.md.

Non-goals of Phase 0

  • No database schema migration yet
  • No runtime behavior changes in TED processing
  • No replacement of ProcurementDocument yet
  • No semantic search refactoring yet

Result

The codebase now has a stable generalized namespace and contract surface for future phases without requiring a disruptive rewrite.