# Phase 4 - Generic Ingestion Pipeline

Phase 4 introduces the first generalized ingestion flow on top of the DOC backbone.

## What is included

- generic ingestion gateway with adapter selection
- file-system ingestion adapter and Camel route
- REST/API upload controller for arbitrary documents
- document type detection by media type / extension
- first extractors for:
  - plain text / markdown / generic XML
  - HTML
  - PDF
  - binary fallback
- default representation builder for non-TED documents
- binary payload support in `DOC.doc_content.binary_content`
- automatic creation of pending generic embeddings for imported representations

## Important behavior

- current TED runtime remains intact
- generic ingestion is disabled by default and must be enabled with:
  - `ted.generic-ingestion.enabled=true`
- file-system polling is separately controlled with:
  - `ted.generic-ingestion.file-system-enabled=true`
- REST/API upload endpoints are under:
  - `/api/v1/dip/import/upload`
  - `/api/v1/dip/import/text`

## Current supported generic document types

- PDF
- HTML
- TEXT
- MARKDOWN
- XML_GENERIC
- UNKNOWN text-like files

DOCX, ZIP child extraction, and MIME body parsing are intentionally left for later phases.