DIP/docs/WAVE2_TED_STRUCTURED_SEARCH...

# Wave 2 — Extended TED structured search in NEW runtime

## What was added

This extension completes the missing parts from the earlier Wave 2 proposal:

1. **Projection-aware TED structured search in NEW runtime**
   - endpoint: `GET /v1/documents/search`
   - endpoint: `POST /v1/documents/search`
   - active only in `dip.runtime.mode=NEW`

2. **Repository-level joins across NEW projection model**
   - `DOC.doc_document`
   - `TED.ted_notice_projection`
   - `TED.ted_notice_lot`
   - `TED.ted_notice_organization`

3. **Extended TED structured filters**
   - `countryCode`, `countryCodes`
   - `noticeType`
   - `contractNature`
   - `procedureType`
   - `cpvPrefix`, `cpvCodes`
   - `nutsCode`, `nutsCodes`
   - `publicationDateFrom`, `publicationDateTo`
   - `submissionDeadlineAfter`
   - `euFunded`
   - `buyerNameContains`
   - `projectTitleContains`

4. **Hybrid ranking path**
   - structured filters first narrow the candidate `document_id` set
   - generic NEW lexical/trigram/semantic search ranks only inside that candidate set
   - request parameter `q` is used as the hybrid query text
   - `similarityThreshold` is forwarded as a per-request semantic threshold override

5. **Facets**
   - countries
   - notice types
   - procedure types
   - buyers
   - publication months (`YYYY-MM`)
   - CPV families (first 2 digits)

6. **Parity coverage**
   - NEW structured-only parity test against legacy `SearchService` for shared filters
   - NEW endpoint integration test for structured results + facets

## Main classes

- `TedStructuredSearchRepository`
- `TedStructuredSearchService`
- `TedStructuredSearchController`
- `TedStructuredSearchFilter`
- `TedStructuredSearchFacets`

## How hybrid search works

For requests with `q`:

1. apply TED structured filters on projection tables
2. collect matching `document_id`s
3. pass those ids into NEW generic search scope as `candidateDocumentIds`
4. let NEW search engines rank those TED documents
5. map ranked hits back to TED summaries

This gives structured filtering plus lexical/trigram/semantic relevance ranking.

## New configuration

```yaml

dip:
  ted:
    projection:
      structured-search-hybrid-candidate-limit: 5000
      structured-search-facet-bucket-limit: 12
```

## Current behavior notes

- Structured-only requests work without `q`
- Hybrid requests use `q` and NEW generic ranking
- When `q` is present, returned `similarity` contains the fused NEW search score
- Facets are computed from the structured candidate set before pagination
- `includeFacets=false` disables facet calculation
- `facetBucketLimit` overrides the default bucket size per request

## Compatibility notes

- The NEW endpoint reuses the legacy `DocumentDtos.SearchRequest` and `SearchResponse`
- The response was extended with optional `facets`
- Existing legacy clients remain compatible because extra JSON fields are additive

## Parity scope

Parity is implemented for **shared structured filters** between legacy and NEW runtime.

Good parity candidates:
- country
- notice type
- contract nature
- procedure type
- publication date range
- submission deadline after
- eu funded
- buyer name contains
- project title contains

Legacy structured parity is **not exact** for filters that legacy `SearchService` does not implement in structured mode, especially:
- lot/organization-expanded `cpvPrefix`
- `cpvCodes`
- `nutsCode`
- `nutsCodes`
- lot-level EU funded semantics

Those are NEW-runtime improvements on top of legacy behavior.