You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/docs/WAVE2_TED_STRUCTURED_SEARCH...

3.4 KiB

Wave 2 — Extended TED structured search in NEW runtime

What was added

This extension completes the missing parts from the earlier Wave 2 proposal:

  1. Projection-aware TED structured search in NEW runtime

    • endpoint: GET /v1/documents/search
    • endpoint: POST /v1/documents/search
    • active only in dip.runtime.mode=NEW
  2. Repository-level joins across NEW projection model

    • DOC.doc_document
    • TED.ted_notice_projection
    • TED.ted_notice_lot
    • TED.ted_notice_organization
  3. Extended TED structured filters

    • countryCode, countryCodes
    • noticeType
    • contractNature
    • procedureType
    • cpvPrefix, cpvCodes
    • nutsCode, nutsCodes
    • publicationDateFrom, publicationDateTo
    • submissionDeadlineAfter
    • euFunded
    • buyerNameContains
    • projectTitleContains
  4. Hybrid ranking path

    • structured filters first narrow the candidate document_id set
    • generic NEW lexical/trigram/semantic search ranks only inside that candidate set
    • request parameter q is used as the hybrid query text
    • similarityThreshold is forwarded as a per-request semantic threshold override
  5. Facets

    • countries
    • notice types
    • procedure types
    • buyers
    • publication months (YYYY-MM)
    • CPV families (first 2 digits)
  6. Parity coverage

    • NEW structured-only parity test against legacy SearchService for shared filters
    • NEW endpoint integration test for structured results + facets

Main classes

  • TedStructuredSearchRepository
  • TedStructuredSearchService
  • TedStructuredSearchController
  • TedStructuredSearchFilter
  • TedStructuredSearchFacets

How hybrid search works

For requests with q:

  1. apply TED structured filters on projection tables
  2. collect matching document_ids
  3. pass those ids into NEW generic search scope as candidateDocumentIds
  4. let NEW search engines rank those TED documents
  5. map ranked hits back to TED summaries

This gives structured filtering plus lexical/trigram/semantic relevance ranking.

New configuration


dip:
  ted:
    projection:
      structured-search-hybrid-candidate-limit: 5000
      structured-search-facet-bucket-limit: 12

Current behavior notes

  • Structured-only requests work without q
  • Hybrid requests use q and NEW generic ranking
  • When q is present, returned similarity contains the fused NEW search score
  • Facets are computed from the structured candidate set before pagination
  • includeFacets=false disables facet calculation
  • facetBucketLimit overrides the default bucket size per request

Compatibility notes

  • The NEW endpoint reuses the legacy DocumentDtos.SearchRequest and SearchResponse
  • The response was extended with optional facets
  • Existing legacy clients remain compatible because extra JSON fields are additive

Parity scope

Parity is implemented for shared structured filters between legacy and NEW runtime.

Good parity candidates:

  • country
  • notice type
  • contract nature
  • procedure type
  • publication date range
  • submission deadline after
  • eu funded
  • buyer name contains
  • project title contains

Legacy structured parity is not exact for filters that legacy SearchService does not implement in structured mode, especially:

  • lot/organization-expanded cpvPrefix
  • cpvCodes
  • nutsCode
  • nutsCodes
  • lot-level EU funded semantics

Those are NEW-runtime improvements on top of legacy behavior.