You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
118 lines
3.4 KiB
Markdown
118 lines
3.4 KiB
Markdown
# Wave 2 — Extended TED structured search in NEW runtime
|
|
|
|
## What was added
|
|
|
|
This extension completes the missing parts from the earlier Wave 2 proposal:
|
|
|
|
1. **Projection-aware TED structured search in NEW runtime**
|
|
- endpoint: `GET /v1/documents/search`
|
|
- endpoint: `POST /v1/documents/search`
|
|
- active only in `dip.runtime.mode=NEW`
|
|
|
|
2. **Repository-level joins across NEW projection model**
|
|
- `DOC.doc_document`
|
|
- `TED.ted_notice_projection`
|
|
- `TED.ted_notice_lot`
|
|
- `TED.ted_notice_organization`
|
|
|
|
3. **Extended TED structured filters**
|
|
- `countryCode`, `countryCodes`
|
|
- `noticeType`
|
|
- `contractNature`
|
|
- `procedureType`
|
|
- `cpvPrefix`, `cpvCodes`
|
|
- `nutsCode`, `nutsCodes`
|
|
- `publicationDateFrom`, `publicationDateTo`
|
|
- `submissionDeadlineAfter`
|
|
- `euFunded`
|
|
- `buyerNameContains`
|
|
- `projectTitleContains`
|
|
|
|
4. **Hybrid ranking path**
|
|
- structured filters first narrow the candidate `document_id` set
|
|
- generic NEW lexical/trigram/semantic search ranks only inside that candidate set
|
|
- request parameter `q` is used as the hybrid query text
|
|
- `similarityThreshold` is forwarded as a per-request semantic threshold override
|
|
|
|
5. **Facets**
|
|
- countries
|
|
- notice types
|
|
- procedure types
|
|
- buyers
|
|
- publication months (`YYYY-MM`)
|
|
- CPV families (first 2 digits)
|
|
|
|
6. **Parity coverage**
|
|
- NEW structured-only parity test against legacy `SearchService` for shared filters
|
|
- NEW endpoint integration test for structured results + facets
|
|
|
|
## Main classes
|
|
|
|
- `TedStructuredSearchRepository`
|
|
- `TedStructuredSearchService`
|
|
- `TedStructuredSearchController`
|
|
- `TedStructuredSearchFilter`
|
|
- `TedStructuredSearchFacets`
|
|
|
|
## How hybrid search works
|
|
|
|
For requests with `q`:
|
|
|
|
1. apply TED structured filters on projection tables
|
|
2. collect matching `document_id`s
|
|
3. pass those ids into NEW generic search scope as `candidateDocumentIds`
|
|
4. let NEW search engines rank those TED documents
|
|
5. map ranked hits back to TED summaries
|
|
|
|
This gives structured filtering plus lexical/trigram/semantic relevance ranking.
|
|
|
|
## New configuration
|
|
|
|
```yaml
|
|
|
|
dip:
|
|
ted:
|
|
projection:
|
|
structured-search-hybrid-candidate-limit: 5000
|
|
structured-search-facet-bucket-limit: 12
|
|
```
|
|
|
|
## Current behavior notes
|
|
|
|
- Structured-only requests work without `q`
|
|
- Hybrid requests use `q` and NEW generic ranking
|
|
- When `q` is present, returned `similarity` contains the fused NEW search score
|
|
- Facets are computed from the structured candidate set before pagination
|
|
- `includeFacets=false` disables facet calculation
|
|
- `facetBucketLimit` overrides the default bucket size per request
|
|
|
|
## Compatibility notes
|
|
|
|
- The NEW endpoint reuses the legacy `DocumentDtos.SearchRequest` and `SearchResponse`
|
|
- The response was extended with optional `facets`
|
|
- Existing legacy clients remain compatible because extra JSON fields are additive
|
|
|
|
## Parity scope
|
|
|
|
Parity is implemented for **shared structured filters** between legacy and NEW runtime.
|
|
|
|
Good parity candidates:
|
|
- country
|
|
- notice type
|
|
- contract nature
|
|
- procedure type
|
|
- publication date range
|
|
- submission deadline after
|
|
- eu funded
|
|
- buyer name contains
|
|
- project title contains
|
|
|
|
Legacy structured parity is **not exact** for filters that legacy `SearchService` does not implement in structured mode, especially:
|
|
- lot/organization-expanded `cpvPrefix`
|
|
- `cpvCodes`
|
|
- `nutsCode`
|
|
- `nutsCodes`
|
|
- lot-level EU funded semantics
|
|
|
|
Those are NEW-runtime improvements on top of legacy behavior.
|