|
|
# Phase 4.1 – TED package and mail/document adapters
|
|
|
|
|
|
This phase extends the generic DOC ingestion SPI with two richer adapters:
|
|
|
|
|
|
- `TedPackageDocumentIngestionAdapter`
|
|
|
- `MailDocumentIngestionAdapter`
|
|
|
|
|
|
## TED package adapter
|
|
|
- imports the package artifact itself as a public DOC document
|
|
|
- expands the `.tar.gz` package into XML child payloads
|
|
|
- imports each child XML as a generic DOC child document
|
|
|
- links children to the package root via `EXTRACTED_FROM`
|
|
|
- keeps the existing legacy TED package processing path intact
|
|
|
|
|
|
## Mail/document adapter
|
|
|
- imports the MIME message as a DOC document
|
|
|
- extracts subject/from/to/body into the mail root semantic text
|
|
|
- imports attachments as child DOC documents
|
|
|
- links attachments via `ATTACHMENT_OF`
|
|
|
- optionally expands ZIP attachments recursively
|
|
|
|
|
|
## Access semantics
|
|
|
- TED packages and TED XML children are imported as `PUBLIC` with no owner tenant
|
|
|
- mail documents use a dedicated default mail access context (`mail-default-owner-tenant-key`, `mail-default-visibility`)
|
|
|
- deduplication is access-scope aware so private content is not merged across different tenants
|
|
|
|
|
|
Additional note:
|
|
|
- wrapper/container documents (for example TED package roots or ZIP wrapper documents expanded into child documents) can skip persistence of ORIGINAL content via `ted.generic-ingestion.store-original-content-for-wrapper-documents=false`, and adapters can now override that default per imported document through `SourceDescriptor.originalContentStoragePolicy` (`STORE` / `SKIP` / `DEFAULT`), while still keeping metadata, derived representations and child relations.
|
|
|
|
|
|
- when original content storage is skipped for a document, GenericDocumentImportService now also skips extraction, derived-content persistence, representation building, and embedding queueing for that document
|
|
|
|
|
|
|
|
|
Schema note:
|
|
|
- `V8__doc_phase4_1_expand_document_and_source_types.sql` expands the generic `DOC` document/source type domain for `TED_PACKAGE` and `PACKAGE_CHILD`, and also repairs older local/dev schemas that used CHECK constraints instead of PostgreSQL ENUM types.
|