You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
115 lines
4.3 KiB
Markdown
115 lines
4.3 KiB
Markdown
# Mail Processing Stabilization Phase — Step 1
|
|
|
|
This step implements the first practical slice of the mail-processing stabilization work:
|
|
|
|
- generic mail-provider contract
|
|
- provider-aware source identifiers for idempotent import
|
|
- typed mail metadata persistence
|
|
- attachment occurrence tracking
|
|
- current Camel/IMAP route adapted to the generic provider contract
|
|
|
|
## Included scope
|
|
|
|
### 1. Generic mail provider contract
|
|
Added a generic abstraction so the ingestion pipeline does not depend on IMAP-specific semantics:
|
|
|
|
- `MailProviderType`
|
|
- `MailProviderEnvelope`
|
|
- `GenericMailProviderEnvelope`
|
|
- `MailProviderEnvelopeAttributes`
|
|
|
|
Current implementation uses `GenericMailProviderEnvelope` for the existing Camel IMAP route.
|
|
Future providers such as POP3, EWS, Microsoft Graph, Gmail API, or replay/file sources can use the same contract.
|
|
|
|
### 2. Provider-aware idempotency foundation
|
|
Added `MailImportIdentityResolver` to derive stable source identifiers for:
|
|
|
|
- root mail message
|
|
- attachment occurrences
|
|
|
|
Priority for root message identity:
|
|
1. provider message key
|
|
2. `Message-ID`
|
|
3. raw MIME hash
|
|
|
|
This allows the import path to remain restart-safe and replay-safe even when content-hash-only deduplication is insufficient.
|
|
|
|
### 3. Generic source-id idempotency in document import
|
|
`GenericDocumentImportService` now checks for an existing `DOC.doc_source` row using:
|
|
|
|
- `source_type`
|
|
- `external_source_id`
|
|
|
|
before content-hash deduplication.
|
|
|
|
This makes source-identifier idempotency reusable beyond mail as well.
|
|
|
|
### 4. Typed mail metadata persistence
|
|
Added new DOC metadata tables/entities:
|
|
|
|
- `DOC.doc_mail_message`
|
|
- `DOC.doc_mail_recipient`
|
|
- `DOC.doc_mail_attachment`
|
|
|
|
These persist:
|
|
- provider/account/folder/message/thread keys
|
|
- `Message-ID`, `In-Reply-To`, `References`
|
|
- normalized subject
|
|
- sender/recipients
|
|
- attachment occurrence metadata
|
|
- part path / archive path / disposition / content-id
|
|
|
|
### 5. Attachment source typing
|
|
Attachments imported from mail now use:
|
|
- `SourceType.MAIL_ATTACHMENT`
|
|
|
|
instead of the generic `MAIL` source type.
|
|
|
|
### 6. Camel IMAP route integration
|
|
The existing Camel mail route now emits generic provider metadata into `SourceDescriptor.attributes()` using the new provider contract.
|
|
|
|
## Not yet included
|
|
|
|
The following are intentionally left for the next step:
|
|
|
|
- replay/reprocess workflows
|
|
- import/reprocess run tracking tables
|
|
- failed attachment retry services
|
|
- thread-aware search/reporting
|
|
- admin/ops visibility endpoints or Camel admin routes
|
|
|
|
## Main implementation files
|
|
|
|
### New files
|
|
- `src/main/java/at/procon/dip/ingestion/mail/MailProviderType.java`
|
|
- `src/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelope.java`
|
|
- `src/main/java/at/procon/dip/ingestion/mail/GenericMailProviderEnvelope.java`
|
|
- `src/main/java/at/procon/dip/ingestion/mail/MailProviderEnvelopeAttributes.java`
|
|
- `src/main/java/at/procon/dip/ingestion/mail/MailImportIdentityResolver.java`
|
|
- `src/main/java/at/procon/dip/domain/document/entity/DocumentMailMessage.java`
|
|
- `src/main/java/at/procon/dip/domain/document/entity/DocumentMailRecipient.java`
|
|
- `src/main/java/at/procon/dip/domain/document/entity/DocumentMailAttachment.java`
|
|
- `src/main/java/at/procon/dip/domain/document/entity/MailRecipientType.java`
|
|
- `src/main/java/at/procon/dip/domain/document/repository/DocumentMailMessageRepository.java`
|
|
- `src/main/java/at/procon/dip/domain/document/repository/DocumentMailRecipientRepository.java`
|
|
- `src/main/java/at/procon/dip/domain/document/repository/DocumentMailAttachmentRepository.java`
|
|
- `src/main/java/at/procon/dip/ingestion/service/MailMetadataPersistenceService.java`
|
|
- `src/main/resources/db/migration/V23__doc_mail_processing_stabilization_step1.sql`
|
|
|
|
### Modified files
|
|
- `src/main/java/at/procon/dip/domain/document/SourceType.java`
|
|
- `src/main/java/at/procon/dip/domain/document/repository/DocumentSourceRepository.java`
|
|
- `src/main/java/at/procon/dip/ingestion/service/GenericDocumentImportService.java`
|
|
- `src/main/java/at/procon/dip/ingestion/service/MailMessageExtractionService.java`
|
|
- `src/main/java/at/procon/dip/ingestion/adapter/MailDocumentIngestionAdapter.java`
|
|
- `src/main/java/at/procon/ted/camel/MailRoute.java`
|
|
|
|
## Recommended next step
|
|
|
|
Proceed with **Step 2** of the Mail Processing Stabilization Phase:
|
|
|
|
- replay/reprocess services
|
|
- failed attachment retry flow
|
|
- import/reprocess run tracking
|
|
- reporting / operational visibility
|