You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/MAIL_PROCESSING_STABILIZATI...

4.3 KiB

Mail Processing Stabilization Phase — Step 1

This step implements the first practical slice of the mail-processing stabilization work:

  • generic mail-provider contract
  • provider-aware source identifiers for idempotent import
  • typed mail metadata persistence
  • attachment occurrence tracking
  • current Camel/IMAP route adapted to the generic provider contract

Included scope

1. Generic mail provider contract

Added a generic abstraction so the ingestion pipeline does not depend on IMAP-specific semantics:

  • MailProviderType
  • MailProviderEnvelope
  • GenericMailProviderEnvelope
  • MailProviderEnvelopeAttributes

Current implementation uses GenericMailProviderEnvelope for the existing Camel IMAP route. Future providers such as POP3, EWS, Microsoft Graph, Gmail API, or replay/file sources can use the same contract.

2. Provider-aware idempotency foundation

Added MailImportIdentityResolver to derive stable source identifiers for:

  • root mail message
  • attachment occurrences

Priority for root message identity:

  1. provider message key
  2. Message-ID
  3. raw MIME hash

This allows the import path to remain restart-safe and replay-safe even when content-hash-only deduplication is insufficient.

3. Generic source-id idempotency in document import

GenericDocumentImportService now checks for an existing DOC.doc_source row using:

  • source_type
  • external_source_id

before content-hash deduplication.

This makes source-identifier idempotency reusable beyond mail as well.

4. Typed mail metadata persistence

Added new DOC metadata tables/entities:

  • DOC.doc_mail_message
  • DOC.doc_mail_recipient
  • DOC.doc_mail_attachment

These persist:

  • provider/account/folder/message/thread keys
  • Message-ID, In-Reply-To, References
  • normalized subject
  • sender/recipients
  • attachment occurrence metadata
  • part path / archive path / disposition / content-id

5. Attachment source typing

Attachments imported from mail now use:

  • SourceType.MAIL_ATTACHMENT

instead of the generic MAIL source type.

6. Camel IMAP route integration

The existing Camel mail route now emits generic provider metadata into SourceDescriptor.attributes() using the new provider contract.

Not yet included

The following are intentionally left for the next step:

  • replay/reprocess workflows
  • import/reprocess run tracking tables
  • failed attachment retry services
  • thread-aware search/reporting
  • admin/ops visibility endpoints or Camel admin routes

Main implementation files

New files

  • src/main/java/at/procon/dip/ingestion/mail/MailProviderType.java
  • src/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelope.java
  • src/main/java/at/procon/dip/ingestion/mail/GenericMailProviderEnvelope.java
  • src/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelopeAttributes.java
  • src/main/java/at/procon/dip/ingestion/mail/MailImportIdentityResolver.java
  • src/main/java/at/procon/dip/domain/document/entity/DocumentMailMessage.java
  • src/main/java/at/procon/dip/domain/document/entity/DocumentMailRecipient.java
  • src/main/java/at/procon/dip/domain/document/entity/DocumentMailAttachment.java
  • src/main/java/at/procon/dip/domain/document/entity/MailRecipientType.java
  • src/main/java/at/procon/dip/domain/document/repository/DocumentMailMessageRepository.java
  • src/main/java/at/procon/dip/domain/document/repository/DocumentMailRecipientRepository.java
  • src/main/java/at/procon/dip/domain/document/repository/DocumentMailAttachmentRepository.java
  • src/main/java/at/procon/dip/ingestion/service/MailMetadataPersistenceService.java
  • src/main/resources/db/migration/V23__doc_mail_processing_stabilization_step1.sql

Modified files

  • src/main/java/at/procon/dip/domain/document/SourceType.java
  • src/main/java/at/procon/dip/domain/document/repository/DocumentSourceRepository.java
  • src/main/java/at/procon/dip/ingestion/service/GenericDocumentImportService.java
  • src/main/java/at/procon/dip/ingestion/service/MailMessageExtractionService.java
  • src/main/java/at/procon/dip/ingestion/adapter/MailDocumentIngestionAdapter.java
  • src/main/java/at/procon/ted/camel/MailRoute.java

Proceed with Step 2 of the Mail Processing Stabilization Phase:

  • replay/reprocess services
  • failed attachment retry flow
  • import/reprocess run tracking
  • reporting / operational visibility