You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1.2 KiB

Phase 4.1 adapter extensions

Added adapters

TED package adapter

  • Source type: TED_PACKAGE
  • Root access: PUBLIC, no owner tenant
  • Root document type: TED_PACKAGE
  • Child source type: PACKAGE_CHILD
  • Child relation: EXTRACTED_FROM

The adapter imports the package artifact plus its XML members into the generic DOC model. It does not replace the existing legacy TED package processing path; instead it complements it, so the later legacy TED parsing step can still enrich the same canonical child documents into proper TED_NOTICE projections by dedup hash.

Mail/document adapter

  • Source type: MAIL
  • Root document type: MIME_MESSAGE
  • Child relation: ATTACHMENT_OF
  • Access: configurable via mail-default-owner-tenant-key and mail-default-visibility

The adapter stores the message body as the semantic root text and imports attachments as child documents. ZIP attachments can optionally be expanded recursively.

Deduplication

Phase 4 deduplication by content hash is refined so the same payload is only deduplicated within the same access scope (visibility + owner tenant). This prevents private documents from different tenants from being merged into one canonical document accidentally.