4.3 KiB
Mail Processing Stabilization Phase — Step 1
This step implements the first practical slice of the mail-processing stabilization work:
- generic mail-provider contract
- provider-aware source identifiers for idempotent import
- typed mail metadata persistence
- attachment occurrence tracking
- current Camel/IMAP route adapted to the generic provider contract
Included scope
1. Generic mail provider contract
Added a generic abstraction so the ingestion pipeline does not depend on IMAP-specific semantics:
MailProviderTypeMailProviderEnvelopeGenericMailProviderEnvelopeMailProviderEnvelopeAttributes
Current implementation uses GenericMailProviderEnvelope for the existing Camel IMAP route.
Future providers such as POP3, EWS, Microsoft Graph, Gmail API, or replay/file sources can use the same contract.
2. Provider-aware idempotency foundation
Added MailImportIdentityResolver to derive stable source identifiers for:
- root mail message
- attachment occurrences
Priority for root message identity:
- provider message key
Message-ID- raw MIME hash
This allows the import path to remain restart-safe and replay-safe even when content-hash-only deduplication is insufficient.
3. Generic source-id idempotency in document import
GenericDocumentImportService now checks for an existing DOC.doc_source row using:
source_typeexternal_source_id
before content-hash deduplication.
This makes source-identifier idempotency reusable beyond mail as well.
4. Typed mail metadata persistence
Added new DOC metadata tables/entities:
DOC.doc_mail_messageDOC.doc_mail_recipientDOC.doc_mail_attachment
These persist:
- provider/account/folder/message/thread keys
Message-ID,In-Reply-To,References- normalized subject
- sender/recipients
- attachment occurrence metadata
- part path / archive path / disposition / content-id
5. Attachment source typing
Attachments imported from mail now use:
SourceType.MAIL_ATTACHMENT
instead of the generic MAIL source type.
6. Camel IMAP route integration
The existing Camel mail route now emits generic provider metadata into SourceDescriptor.attributes() using the new provider contract.
Not yet included
The following are intentionally left for the next step:
- replay/reprocess workflows
- import/reprocess run tracking tables
- failed attachment retry services
- thread-aware search/reporting
- admin/ops visibility endpoints or Camel admin routes
Main implementation files
New files
src/main/java/at/procon/dip/ingestion/mail/MailProviderType.javasrc/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelope.javasrc/main/java/at/procon/dip/ingestion/mail/GenericMailProviderEnvelope.javasrc/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelopeAttributes.javasrc/main/java/at/procon/dip/ingestion/mail/MailImportIdentityResolver.javasrc/main/java/at/procon/dip/domain/document/entity/DocumentMailMessage.javasrc/main/java/at/procon/dip/domain/document/entity/DocumentMailRecipient.javasrc/main/java/at/procon/dip/domain/document/entity/DocumentMailAttachment.javasrc/main/java/at/procon/dip/domain/document/entity/MailRecipientType.javasrc/main/java/at/procon/dip/domain/document/repository/DocumentMailMessageRepository.javasrc/main/java/at/procon/dip/domain/document/repository/DocumentMailRecipientRepository.javasrc/main/java/at/procon/dip/domain/document/repository/DocumentMailAttachmentRepository.javasrc/main/java/at/procon/dip/ingestion/service/MailMetadataPersistenceService.javasrc/main/resources/db/migration/V23__doc_mail_processing_stabilization_step1.sql
Modified files
src/main/java/at/procon/dip/domain/document/SourceType.javasrc/main/java/at/procon/dip/domain/document/repository/DocumentSourceRepository.javasrc/main/java/at/procon/dip/ingestion/service/GenericDocumentImportService.javasrc/main/java/at/procon/dip/ingestion/service/MailMessageExtractionService.javasrc/main/java/at/procon/dip/ingestion/adapter/MailDocumentIngestionAdapter.javasrc/main/java/at/procon/ted/camel/MailRoute.java
Recommended next step
Proceed with Step 2 of the Mail Processing Stabilization Phase:
- replay/reprocess services
- failed attachment retry flow
- import/reprocess run tracking
- reporting / operational visibility