You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/MAIL_PROCESSING_STABILIZATI...

115 lines
4.3 KiB
Markdown

# Mail Processing Stabilization Phase — Step 1
This step implements the first practical slice of the mail-processing stabilization work:
- generic mail-provider contract
- provider-aware source identifiers for idempotent import
- typed mail metadata persistence
- attachment occurrence tracking
- current Camel/IMAP route adapted to the generic provider contract
## Included scope
### 1. Generic mail provider contract
Added a generic abstraction so the ingestion pipeline does not depend on IMAP-specific semantics:
- `MailProviderType`
- `MailProviderEnvelope`
- `GenericMailProviderEnvelope`
- `MailProviderEnvelopeAttributes`
Current implementation uses `GenericMailProviderEnvelope` for the existing Camel IMAP route.
Future providers such as POP3, EWS, Microsoft Graph, Gmail API, or replay/file sources can use the same contract.
### 2. Provider-aware idempotency foundation
Added `MailImportIdentityResolver` to derive stable source identifiers for:
- root mail message
- attachment occurrences
Priority for root message identity:
1. provider message key
2. `Message-ID`
3. raw MIME hash
This allows the import path to remain restart-safe and replay-safe even when content-hash-only deduplication is insufficient.
### 3. Generic source-id idempotency in document import
`GenericDocumentImportService` now checks for an existing `DOC.doc_source` row using:
- `source_type`
- `external_source_id`
before content-hash deduplication.
This makes source-identifier idempotency reusable beyond mail as well.
### 4. Typed mail metadata persistence
Added new DOC metadata tables/entities:
- `DOC.doc_mail_message`
- `DOC.doc_mail_recipient`
- `DOC.doc_mail_attachment`
These persist:
- provider/account/folder/message/thread keys
- `Message-ID`, `In-Reply-To`, `References`
- normalized subject
- sender/recipients
- attachment occurrence metadata
- part path / archive path / disposition / content-id
### 5. Attachment source typing
Attachments imported from mail now use:
- `SourceType.MAIL_ATTACHMENT`
instead of the generic `MAIL` source type.
### 6. Camel IMAP route integration
The existing Camel mail route now emits generic provider metadata into `SourceDescriptor.attributes()` using the new provider contract.
## Not yet included
The following are intentionally left for the next step:
- replay/reprocess workflows
- import/reprocess run tracking tables
- failed attachment retry services
- thread-aware search/reporting
- admin/ops visibility endpoints or Camel admin routes
## Main implementation files
### New files
- `src/main/java/at/procon/dip/ingestion/mail/MailProviderType.java`
- `src/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelope.java`
- `src/main/java/at/procon/dip/ingestion/mail/GenericMailProviderEnvelope.java`
- `src/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelopeAttributes.java`
- `src/main/java/at/procon/dip/ingestion/mail/MailImportIdentityResolver.java`
- `src/main/java/at/procon/dip/domain/document/entity/DocumentMailMessage.java`
- `src/main/java/at/procon/dip/domain/document/entity/DocumentMailRecipient.java`
- `src/main/java/at/procon/dip/domain/document/entity/DocumentMailAttachment.java`
- `src/main/java/at/procon/dip/domain/document/entity/MailRecipientType.java`
- `src/main/java/at/procon/dip/domain/document/repository/DocumentMailMessageRepository.java`
- `src/main/java/at/procon/dip/domain/document/repository/DocumentMailRecipientRepository.java`
- `src/main/java/at/procon/dip/domain/document/repository/DocumentMailAttachmentRepository.java`
- `src/main/java/at/procon/dip/ingestion/service/MailMetadataPersistenceService.java`
- `src/main/resources/db/migration/V23__doc_mail_processing_stabilization_step1.sql`
### Modified files
- `src/main/java/at/procon/dip/domain/document/SourceType.java`
- `src/main/java/at/procon/dip/domain/document/repository/DocumentSourceRepository.java`
- `src/main/java/at/procon/dip/ingestion/service/GenericDocumentImportService.java`
- `src/main/java/at/procon/dip/ingestion/service/MailMessageExtractionService.java`
- `src/main/java/at/procon/dip/ingestion/adapter/MailDocumentIngestionAdapter.java`
- `src/main/java/at/procon/ted/camel/MailRoute.java`
## Recommended next step
Proceed with **Step 2** of the Mail Processing Stabilization Phase:
- replay/reprocess services
- failed attachment retry flow
- import/reprocess run tracking
- reporting / operational visibility