# Phase 4 - Generic Ingestion Pipeline Phase 4 introduces the first generalized ingestion flow on top of the DOC backbone. ## What is included - generic ingestion gateway with adapter selection - file-system ingestion adapter and Camel route - REST/API upload controller for arbitrary documents - document type detection by media type / extension - first extractors for: - plain text / markdown / generic XML - HTML - PDF - binary fallback - default representation builder for non-TED documents - binary payload support in `DOC.doc_content.binary_content` - automatic creation of pending generic embeddings for imported representations ## Important behavior - current TED runtime remains intact - generic ingestion is disabled by default and must be enabled with: - `ted.generic-ingestion.enabled=true` - file-system polling is separately controlled with: - `ted.generic-ingestion.file-system-enabled=true` - REST/API upload endpoints are under: - `/api/v1/dip/import/upload` - `/api/v1/dip/import/text` ## Current supported generic document types - PDF - HTML - TEXT - MARKDOWN - XML_GENERIC - UNKNOWN text-like files DOCX, ZIP child extraction, and MIME body parsing are intentionally left for later phases.