You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/WAVE2_TED_STRUCTURED_SEARCH.md

176 lines
4.6 KiB
Markdown

# Wave 2 — NEW TED Structured Search
## Purpose
Wave 2 adds a NEW-runtime TED search endpoint that keeps the legacy request and response shape of `/v1/documents/search`, but executes the search against `TED.ted_notice_projection` instead of the legacy search path.
The goal is twofold:
1. provide NEW-runtime structured TED search functionality
2. make cutover measurable through parity checks against the legacy search implementation
## Runtime scope
This functionality is active only in `RuntimeMode.NEW`.
Controller:
- `at.procon.dip.domain.ted.web.TedStructuredSearchController`
Service:
- `at.procon.dip.domain.ted.service.TedStructuredSearchService`
Repository:
- `at.procon.dip.domain.ted.search.TedStructuredSearchRepository`
## Endpoint
### GET
`GET /v1/documents/search`
### POST
`POST /v1/documents/search`
The POST body uses the existing legacy-compatible DTO:
- `at.procon.ted.model.dto.DocumentDtos.SearchRequest`
The response uses:
- `at.procon.ted.model.dto.DocumentDtos.SearchResponse`
## Implemented structured filters
The Wave 2 implementation supports these filters:
- `countryCode`
- `countryCodes`
- `noticeType`
- `contractNature`
- `procedureType`
- `cpvPrefix`
- `cpvCodes`
- `nutsCode`
- `nutsCodes`
- `publicationDateFrom`
- `publicationDateTo`
- `submissionDeadlineAfter`
- `euFunded`
- `buyerNameContains`
- `projectTitleContains`
## Sorting and pagination
Supported sorting:
- `publicationDate`
- `submissionDeadline`
- `buyerName`
- `projectTitle`
Supported directions:
- `asc`
- `desc`
Pagination behavior:
- page defaults to `0`
- size defaults to `DipSearchProperties.defaultPageSize`
- size is capped by `DipSearchProperties.maxPageSize`
## Data source
The endpoint reads from:
- `TED.ted_notice_projection`
This means the quality and completeness of the search results depend on Wave 1 migration and projection backfill completeness.
## Functional behavior
The Wave 2 implementation is intentionally **structured-search-first**.
Although the request DTO still contains:
- `semanticQuery`
- `similarityThreshold`
these fields are currently accepted only for request compatibility and future extension. The current repository implementation does **not** apply semantic ranking or semantic filtering.
That is deliberate for Wave 2, because the main objective is:
- structured search on the NEW model
- parity verification against legacy behavior for common structured filters
## Parity strategy
Wave 2 adds parity-focused tests that compare NEW structured search behavior against the legacy TED search for a common subset of structured filters.
Recommended parity focus:
- country filters
- notice type
- procedure type
- publication date range
- EU-funded filter
- deterministic sort order
Parity should be evaluated on:
- total result count
- ordered publication ids / notice ids for stable cases
- key metadata fields in `DocumentSummary`
## Current limitations
1. No semantic scoring is applied in the NEW structured TED search path yet.
2. No TED facets/aggregations are included yet.
3. Search is projection-based, so missing or stale `ted_notice_projection` rows can cause parity differences.
4. The Wave 2 scope is TED-specific structured retrieval, not the full generic hybrid search fusion pipeline.
## Example GET request
```http
GET /v1/documents/search?countryCode=AT&noticeType=CN_STANDARD&publicationDateFrom=2025-01-01&publicationDateTo=2025-12-31&page=0&size=20&sortBy=publicationDate&sortDirection=desc
```
## Example POST request
```json
{
"countryCodes": ["AT", "DE"],
"noticeType": "CN_STANDARD",
"contractNature": "SERVICES",
"procedureType": "OPEN",
"cpvPrefix": "79000000",
"cpvCodes": ["79341000"],
"nutsCodes": ["AT130", "DE300"],
"publicationDateFrom": "2025-01-01",
"publicationDateTo": "2025-12-31",
"submissionDeadlineAfter": "2025-06-01T00:00:00Z",
"euFunded": true,
"buyerNameContains": "city",
"projectTitleContains": "digital",
"semanticQuery": "framework agreement for digital transformation services",
"similarityThreshold": 0.7,
"page": 0,
"size": 20,
"sortBy": "publicationDate",
"sortDirection": "desc"
}
```
## Postman collection
Use the companion file:
- `WAVE2_TED_STRUCTURED_SEARCH.postman_collection.json`
It contains:
- basic GET search
- CPV/NUTS/buyer GET example
- full POST structured request
- a parity-oriented GET request for manual comparison against legacy search
## Recommended next step after Wave 2 validation
After parity is accepted, the next logical enhancement is:
1. add TED facets and richer structural filters
2. merge structured TED narrowing with lexical/semantic ranking
3. expose a documented parity validation checklist for cutover approval