ted extended structured search
parent
f9fa8aadf7
commit
3284205a9e
@ -0,0 +1,117 @@
|
||||
# Wave 2 — Extended TED structured search in NEW runtime
|
||||
|
||||
## What was added
|
||||
|
||||
This extension completes the missing parts from the earlier Wave 2 proposal:
|
||||
|
||||
1. **Projection-aware TED structured search in NEW runtime**
|
||||
- endpoint: `GET /v1/documents/search`
|
||||
- endpoint: `POST /v1/documents/search`
|
||||
- active only in `dip.runtime.mode=NEW`
|
||||
|
||||
2. **Repository-level joins across NEW projection model**
|
||||
- `DOC.doc_document`
|
||||
- `TED.ted_notice_projection`
|
||||
- `TED.ted_notice_lot`
|
||||
- `TED.ted_notice_organization`
|
||||
|
||||
3. **Extended TED structured filters**
|
||||
- `countryCode`, `countryCodes`
|
||||
- `noticeType`
|
||||
- `contractNature`
|
||||
- `procedureType`
|
||||
- `cpvPrefix`, `cpvCodes`
|
||||
- `nutsCode`, `nutsCodes`
|
||||
- `publicationDateFrom`, `publicationDateTo`
|
||||
- `submissionDeadlineAfter`
|
||||
- `euFunded`
|
||||
- `buyerNameContains`
|
||||
- `projectTitleContains`
|
||||
|
||||
4. **Hybrid ranking path**
|
||||
- structured filters first narrow the candidate `document_id` set
|
||||
- generic NEW lexical/trigram/semantic search ranks only inside that candidate set
|
||||
- request parameter `q` is used as the hybrid query text
|
||||
- `similarityThreshold` is forwarded as a per-request semantic threshold override
|
||||
|
||||
5. **Facets**
|
||||
- countries
|
||||
- notice types
|
||||
- procedure types
|
||||
- buyers
|
||||
- publication months (`YYYY-MM`)
|
||||
- CPV families (first 2 digits)
|
||||
|
||||
6. **Parity coverage**
|
||||
- NEW structured-only parity test against legacy `SearchService` for shared filters
|
||||
- NEW endpoint integration test for structured results + facets
|
||||
|
||||
## Main classes
|
||||
|
||||
- `TedStructuredSearchRepository`
|
||||
- `TedStructuredSearchService`
|
||||
- `TedStructuredSearchController`
|
||||
- `TedStructuredSearchFilter`
|
||||
- `TedStructuredSearchFacets`
|
||||
|
||||
## How hybrid search works
|
||||
|
||||
For requests with `q`:
|
||||
|
||||
1. apply TED structured filters on projection tables
|
||||
2. collect matching `document_id`s
|
||||
3. pass those ids into NEW generic search scope as `candidateDocumentIds`
|
||||
4. let NEW search engines rank those TED documents
|
||||
5. map ranked hits back to TED summaries
|
||||
|
||||
This gives structured filtering plus lexical/trigram/semantic relevance ranking.
|
||||
|
||||
## New configuration
|
||||
|
||||
```yaml
|
||||
|
||||
dip:
|
||||
ted:
|
||||
projection:
|
||||
structured-search-hybrid-candidate-limit: 5000
|
||||
structured-search-facet-bucket-limit: 12
|
||||
```
|
||||
|
||||
## Current behavior notes
|
||||
|
||||
- Structured-only requests work without `q`
|
||||
- Hybrid requests use `q` and NEW generic ranking
|
||||
- When `q` is present, returned `similarity` contains the fused NEW search score
|
||||
- Facets are computed from the structured candidate set before pagination
|
||||
- `includeFacets=false` disables facet calculation
|
||||
- `facetBucketLimit` overrides the default bucket size per request
|
||||
|
||||
## Compatibility notes
|
||||
|
||||
- The NEW endpoint reuses the legacy `DocumentDtos.SearchRequest` and `SearchResponse`
|
||||
- The response was extended with optional `facets`
|
||||
- Existing legacy clients remain compatible because extra JSON fields are additive
|
||||
|
||||
## Parity scope
|
||||
|
||||
Parity is implemented for **shared structured filters** between legacy and NEW runtime.
|
||||
|
||||
Good parity candidates:
|
||||
- country
|
||||
- notice type
|
||||
- contract nature
|
||||
- procedure type
|
||||
- publication date range
|
||||
- submission deadline after
|
||||
- eu funded
|
||||
- buyer name contains
|
||||
- project title contains
|
||||
|
||||
Legacy structured parity is **not exact** for filters that legacy `SearchService` does not implement in structured mode, especially:
|
||||
- lot/organization-expanded `cpvPrefix`
|
||||
- `cpvCodes`
|
||||
- `nutsCode`
|
||||
- `nutsCodes`
|
||||
- lot-level EU funded semantics
|
||||
|
||||
Those are NEW-runtime improvements on top of legacy behavior.
|
||||
@ -0,0 +1,178 @@
|
||||
{
|
||||
"info": {
|
||||
"_postman_id": "9f9b7a8a-b96b-4f3a-a377-0ce5b54d0a01",
|
||||
"name": "DIP Semantic Search - e5-default",
|
||||
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json",
|
||||
"description": "Sample semantic and hybrid search queries against the DIP generic search endpoint using semanticModelKey=e5-default (intfloat/multilingual-e5-large)."
|
||||
},
|
||||
"item": [
|
||||
{
|
||||
"name": "Search / Semantic / English",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [
|
||||
{
|
||||
"key": "Content-Type",
|
||||
"value": "application/json"
|
||||
}
|
||||
],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"queryText\": \"framework agreement for district heating optimization in municipal energy systems\",\n \"modes\": [\n \"SEMANTIC\"\n ],\n \"semanticModelKey\": \"e5-default\",\n \"collapseByDocument\": true,\n \"representationSelectionMode\": \"PRIMARY_AND_CHUNKS\",\n \"page\": 0,\n \"size\": 10\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/search",
|
||||
"host": [
|
||||
"{{baseUrl}}"
|
||||
],
|
||||
"path": [
|
||||
"search"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Search / Semantic / German",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [
|
||||
{
|
||||
"key": "Content-Type",
|
||||
"value": "application/json"
|
||||
}
|
||||
],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"queryText\": \"Rahmenvertrag für die Optimierung von Fernwärmesystemen in kommunalen Energienetzen\",\n \"modes\": [\n \"SEMANTIC\"\n ],\n \"semanticModelKey\": \"e5-default\",\n \"collapseByDocument\": true,\n \"representationSelectionMode\": \"PRIMARY_AND_CHUNKS\",\n \"page\": 0,\n \"size\": 10\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/search",
|
||||
"host": [
|
||||
"{{baseUrl}}"
|
||||
],
|
||||
"path": [
|
||||
"search"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Search / Semantic / Bulgarian",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [
|
||||
{
|
||||
"key": "Content-Type",
|
||||
"value": "application/json"
|
||||
}
|
||||
],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"queryText\": \"рамково споразумение за оптимизация на системи за централно отопление в общински енергийни мрежи\",\n \"modes\": [\n \"SEMANTIC\"\n ],\n \"semanticModelKey\": \"e5-default\",\n \"collapseByDocument\": true,\n \"representationSelectionMode\": \"PRIMARY_AND_CHUNKS\",\n \"page\": 0,\n \"size\": 10\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/search",
|
||||
"host": [
|
||||
"{{baseUrl}}"
|
||||
],
|
||||
"path": [
|
||||
"search"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Search / Hybrid / English",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [
|
||||
{
|
||||
"key": "Content-Type",
|
||||
"value": "application/json"
|
||||
}
|
||||
],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"queryText\": \"district heating optimization framework agreement\",\n \"modes\": [\n \"HYBRID\"\n ],\n \"semanticModelKey\": \"e5-default\",\n \"collapseByDocument\": true,\n \"representationSelectionMode\": \"PRIMARY_AND_CHUNKS\",\n \"page\": 0,\n \"size\": 10\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/search",
|
||||
"host": [
|
||||
"{{baseUrl}}"
|
||||
],
|
||||
"path": [
|
||||
"search"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Search / Semantic / Generic Filters",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [
|
||||
{
|
||||
"key": "Content-Type",
|
||||
"value": "application/json"
|
||||
}
|
||||
],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"queryText\": \"municipal energy efficiency strategy\",\n \"modes\": [\n \"SEMANTIC\"\n ],\n \"semanticModelKey\": \"e5-default\",\n \"documentTypes\": [\n \"TEXT\",\n \"HTML\",\n \"PDF\"\n ],\n \"documentFamilies\": [\n \"GENERIC\"\n ],\n \"representationTypes\": [\n \"SEMANTIC_TEXT\",\n \"CHUNK\"\n ],\n \"languageCodes\": [\n \"en\",\n \"de\",\n \"bg\"\n ],\n \"collapseByDocument\": true,\n \"representationSelectionMode\": \"PRIMARY_AND_CHUNKS\",\n \"page\": 0,\n \"size\": 10\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/search",
|
||||
"host": [
|
||||
"{{baseUrl}}"
|
||||
],
|
||||
"path": [
|
||||
"search"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Search / Debug / Semantic",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [
|
||||
{
|
||||
"key": "Content-Type",
|
||||
"value": "application/json"
|
||||
}
|
||||
],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"queryText\": \"district heating optimization\",\n \"modes\": [\n \"SEMANTIC\"\n ],\n \"semanticModelKey\": \"e5-default\",\n \"collapseByDocument\": true,\n \"representationSelectionMode\": \"PRIMARY_AND_CHUNKS\",\n \"page\": 0,\n \"size\": 10\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/search/debug",
|
||||
"host": [
|
||||
"{{baseUrl}}"
|
||||
],
|
||||
"path": [
|
||||
"search",
|
||||
"debug"
|
||||
]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Search / Metrics",
|
||||
"request": {
|
||||
"method": "GET",
|
||||
"header": [],
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/search/metrics",
|
||||
"host": [
|
||||
"{{baseUrl}}"
|
||||
],
|
||||
"path": [
|
||||
"search",
|
||||
"metrics"
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@ -0,0 +1,15 @@
|
||||
{
|
||||
"id": "f2cf3c4b-e0f7-45ff-a9c2-32f4d3d23770",
|
||||
"name": "DIP Semantic Search Local",
|
||||
"values": [
|
||||
{
|
||||
"key": "baseUrl",
|
||||
"value": "http://localhost:8080/api",
|
||||
"type": "default",
|
||||
"enabled": true
|
||||
}
|
||||
],
|
||||
"_postman_variable_scope": "environment",
|
||||
"_postman_exported_at": "2026-03-23T13:00:00Z",
|
||||
"_postman_exported_using": "OpenAI ChatGPT"
|
||||
}
|
||||
@ -0,0 +1,103 @@
|
||||
{
|
||||
"info": {
|
||||
"name": "Wave 2 TED Structured Search Extended",
|
||||
"schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json",
|
||||
"description": "NEW runtime TED structured search with projection-aware filters, hybrid ranking, and facets."
|
||||
},
|
||||
"variable": [
|
||||
{ "key": "baseUrl", "value": "http://localhost:8080/api" }
|
||||
],
|
||||
"item": [
|
||||
{
|
||||
"name": "Structured only - GET",
|
||||
"request": {
|
||||
"method": "GET",
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/v1/documents/search?countryCode=AUT¬iceType=CONTRACT_NOTICE&includeFacets=true&page=0&size=20&sortBy=publicationDate&sortDirection=desc",
|
||||
"host": ["{{baseUrl}}"],
|
||||
"path": ["v1", "documents", "search"],
|
||||
"query": [
|
||||
{ "key": "countryCode", "value": "AUT" },
|
||||
{ "key": "noticeType", "value": "CONTRACT_NOTICE" },
|
||||
{ "key": "includeFacets", "value": "true" },
|
||||
{ "key": "page", "value": "0" },
|
||||
{ "key": "size", "value": "20" },
|
||||
{ "key": "sortBy", "value": "publicationDate" },
|
||||
{ "key": "sortDirection", "value": "desc" }
|
||||
]
|
||||
}
|
||||
},
|
||||
"event": [{
|
||||
"listen": "test",
|
||||
"script": {
|
||||
"exec": [
|
||||
"pm.test('status 200', function () { pm.response.to.have.status(200); });",
|
||||
"const json = pm.response.json();",
|
||||
"pm.test('documents array exists', function () { pm.expect(json.documents).to.be.an('array'); });",
|
||||
"pm.test('facets object exists', function () { pm.expect(json.facets).to.be.an('object'); });"
|
||||
]
|
||||
}
|
||||
}]
|
||||
},
|
||||
{
|
||||
"name": "Hybrid ranked TED search - GET",
|
||||
"request": {
|
||||
"method": "GET",
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/v1/documents/search?countryCode=DEU&cpvPrefix=33&q=medical imaging systems&similarityThreshold=0.65&includeFacets=true",
|
||||
"host": ["{{baseUrl}}"],
|
||||
"path": ["v1", "documents", "search"],
|
||||
"query": [
|
||||
{ "key": "countryCode", "value": "DEU" },
|
||||
{ "key": "cpvPrefix", "value": "33" },
|
||||
{ "key": "q", "value": "medical imaging systems" },
|
||||
{ "key": "similarityThreshold", "value": "0.65" },
|
||||
{ "key": "includeFacets", "value": "true" }
|
||||
]
|
||||
}
|
||||
},
|
||||
"event": [{
|
||||
"listen": "test",
|
||||
"script": {
|
||||
"exec": [
|
||||
"pm.test('status 200', function () { pm.response.to.have.status(200); });",
|
||||
"const json = pm.response.json();",
|
||||
"pm.test('documents array exists', function () { pm.expect(json.documents).to.be.an('array'); });"
|
||||
]
|
||||
}
|
||||
}]
|
||||
},
|
||||
{
|
||||
"name": "Structured only - POST with facets",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [{ "key": "Content-Type", "value": "application/json" }],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"countryCodes\": [\"AUT\", \"DEU\"],\n \"noticeType\": \"CONTRACT_NOTICE\",\n \"contractNature\": \"SUPPLIES\",\n \"procedureType\": \"OPEN\",\n \"publicationDateFrom\": \"2026-01-01\",\n \"publicationDateTo\": \"2026-12-31\",\n \"includeFacets\": true,\n \"facetBucketLimit\": 10,\n \"page\": 0,\n \"size\": 20,\n \"sortBy\": \"publicationDate\",\n \"sortDirection\": \"desc\"\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/v1/documents/search",
|
||||
"host": ["{{baseUrl}}"],
|
||||
"path": ["v1", "documents", "search"]
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "Parity-style request for shared legacy filters",
|
||||
"request": {
|
||||
"method": "POST",
|
||||
"header": [{ "key": "Content-Type", "value": "application/json" }],
|
||||
"body": {
|
||||
"mode": "raw",
|
||||
"raw": "{\n \"countryCode\": \"AUT\",\n \"noticeType\": \"CONTRACT_NOTICE\",\n \"contractNature\": \"SERVICES\",\n \"procedureType\": \"OPEN\",\n \"projectTitleContains\": \"maintenance\",\n \"publicationDateFrom\": \"2026-04-01\",\n \"publicationDateTo\": \"2026-04-30\",\n \"page\": 0,\n \"size\": 20,\n \"sortBy\": \"publicationDate\",\n \"sortDirection\": \"desc\"\n}"
|
||||
},
|
||||
"url": {
|
||||
"raw": "{{baseUrl}}/v1/documents/search",
|
||||
"host": ["{{baseUrl}}"],
|
||||
"path": ["v1", "documents", "search"]
|
||||
}
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
@ -0,0 +1,16 @@
|
||||
package at.procon.dip.domain.ted.search.dto;
|
||||
|
||||
import lombok.AllArgsConstructor;
|
||||
import lombok.Builder;
|
||||
import lombok.Data;
|
||||
import lombok.NoArgsConstructor;
|
||||
|
||||
@Data
|
||||
@Builder
|
||||
@NoArgsConstructor
|
||||
@AllArgsConstructor
|
||||
public class TedStructuredSearchFacetEntry {
|
||||
private String key;
|
||||
private String label;
|
||||
private long count;
|
||||
}
|
||||
@ -0,0 +1,20 @@
|
||||
package at.procon.dip.domain.ted.search.dto;
|
||||
|
||||
import java.util.List;
|
||||
import lombok.AllArgsConstructor;
|
||||
import lombok.Builder;
|
||||
import lombok.Data;
|
||||
import lombok.NoArgsConstructor;
|
||||
|
||||
@Data
|
||||
@Builder
|
||||
@NoArgsConstructor
|
||||
@AllArgsConstructor
|
||||
public class TedStructuredSearchFacets {
|
||||
private List<TedStructuredSearchFacetEntry> countries;
|
||||
private List<TedStructuredSearchFacetEntry> noticeTypes;
|
||||
private List<TedStructuredSearchFacetEntry> procedureTypes;
|
||||
private List<TedStructuredSearchFacetEntry> buyers;
|
||||
private List<TedStructuredSearchFacetEntry> publicationMonths;
|
||||
private List<TedStructuredSearchFacetEntry> cpvFamilies;
|
||||
}
|
||||
@ -0,0 +1,34 @@
|
||||
package at.procon.dip.domain.ted.search.dto;
|
||||
|
||||
import at.procon.ted.model.entity.ContractNature;
|
||||
import at.procon.ted.model.entity.NoticeType;
|
||||
import at.procon.ted.model.entity.ProcedureType;
|
||||
import java.time.LocalDate;
|
||||
import java.time.OffsetDateTime;
|
||||
import java.util.List;
|
||||
import lombok.AllArgsConstructor;
|
||||
import lombok.Builder;
|
||||
import lombok.Data;
|
||||
import lombok.NoArgsConstructor;
|
||||
|
||||
@Data
|
||||
@Builder
|
||||
@NoArgsConstructor
|
||||
@AllArgsConstructor
|
||||
public class TedStructuredSearchFilter {
|
||||
private String countryCode;
|
||||
private List<String> countryCodes;
|
||||
private NoticeType noticeType;
|
||||
private ContractNature contractNature;
|
||||
private ProcedureType procedureType;
|
||||
private String cpvPrefix;
|
||||
private List<String> cpvCodes;
|
||||
private String nutsCode;
|
||||
private List<String> nutsCodes;
|
||||
private LocalDate publicationDateFrom;
|
||||
private LocalDate publicationDateTo;
|
||||
private OffsetDateTime submissionDeadlineAfter;
|
||||
private Boolean euFunded;
|
||||
private String buyerNameContains;
|
||||
private String projectTitleContains;
|
||||
}
|
||||
@ -0,0 +1,30 @@
|
||||
package at.procon.dip.domain.ted.search.dto;
|
||||
|
||||
import at.procon.ted.model.entity.ContractNature;
|
||||
import at.procon.ted.model.entity.NoticeType;
|
||||
import at.procon.ted.model.entity.ProcedureType;
|
||||
import java.math.BigDecimal;
|
||||
import java.time.LocalDate;
|
||||
import java.time.OffsetDateTime;
|
||||
import java.util.List;
|
||||
import java.util.UUID;
|
||||
|
||||
public record TedStructuredSearchSummaryRow(
|
||||
UUID documentId,
|
||||
String publicationId,
|
||||
String noticeId,
|
||||
NoticeType noticeType,
|
||||
String projectTitle,
|
||||
String buyerName,
|
||||
String buyerCountryCode,
|
||||
String buyerCity,
|
||||
ContractNature contractNature,
|
||||
ProcedureType procedureType,
|
||||
LocalDate publicationDate,
|
||||
OffsetDateTime submissionDeadline,
|
||||
List<String> cpvCodes,
|
||||
Integer totalLots,
|
||||
BigDecimal estimatedValue,
|
||||
String estimatedValueCurrency
|
||||
) {
|
||||
}
|
||||
@ -1,43 +1,186 @@
|
||||
package at.procon.dip.domain.ted.service;
|
||||
|
||||
import at.procon.dip.domain.ted.config.TedProjectionProperties;
|
||||
import at.procon.dip.domain.ted.search.TedStructuredSearchRepository;
|
||||
import at.procon.dip.domain.ted.search.dto.TedStructuredSearchFacets;
|
||||
import at.procon.dip.domain.ted.search.dto.TedStructuredSearchFilter;
|
||||
import at.procon.dip.domain.ted.search.dto.TedStructuredSearchSummaryRow;
|
||||
import at.procon.dip.runtime.condition.ConditionalOnRuntimeMode;
|
||||
import at.procon.dip.runtime.config.RuntimeMode;
|
||||
import at.procon.dip.search.config.DipSearchProperties;
|
||||
import at.procon.dip.search.dto.SearchMode;
|
||||
import at.procon.dip.search.dto.SearchSortMode;
|
||||
import at.procon.dip.search.spi.SearchDocumentScope;
|
||||
import at.procon.dip.search.service.SearchOrchestrator;
|
||||
import at.procon.ted.model.dto.DocumentDtos.DocumentSummary;
|
||||
import at.procon.ted.model.dto.DocumentDtos.SearchRequest;
|
||||
import at.procon.ted.model.dto.DocumentDtos.SearchResponse;
|
||||
import java.util.LinkedHashMap;
|
||||
import java.util.List;
|
||||
import java.util.Map;
|
||||
import java.util.Set;
|
||||
import java.util.UUID;
|
||||
import java.util.stream.Collectors;
|
||||
import lombok.RequiredArgsConstructor;
|
||||
import org.springframework.stereotype.Service;
|
||||
import org.springframework.transaction.annotation.Transactional;
|
||||
import org.springframework.util.StringUtils;
|
||||
|
||||
@Service
|
||||
@ConditionalOnRuntimeMode(RuntimeMode.NEW)
|
||||
@RequiredArgsConstructor
|
||||
@ConditionalOnRuntimeMode(RuntimeMode.NEW)
|
||||
@Transactional(readOnly = true)
|
||||
public class TedStructuredSearchService {
|
||||
|
||||
private final TedStructuredSearchRepository repository;
|
||||
private final DipSearchProperties searchProperties;
|
||||
private final SearchOrchestrator searchOrchestrator;
|
||||
private final TedProjectionProperties tedProjectionProperties;
|
||||
|
||||
public SearchResponse search(SearchRequest request) {
|
||||
int page = request.getPage() != null ? Math.max(request.getPage(), 0) : 0;
|
||||
int size = Math.min(
|
||||
request.getSize() != null ? Math.max(request.getSize(), 1) : searchProperties.getDefaultPageSize(),
|
||||
searchProperties.getMaxPageSize()
|
||||
int page = request.getPage() != null && request.getPage() >= 0 ? request.getPage() : 0;
|
||||
int size = request.getSize() != null && request.getSize() > 0 ? request.getSize() : 20;
|
||||
TedStructuredSearchFilter filter = toFilter(request);
|
||||
int facetLimit = request.getFacetBucketLimit() != null && request.getFacetBucketLimit() > 0
|
||||
? request.getFacetBucketLimit()
|
||||
: tedProjectionProperties.getStructuredSearchFacetBucketLimit();
|
||||
TedStructuredSearchFacets facets = Boolean.FALSE.equals(request.getIncludeFacets())
|
||||
? null
|
||||
: repository.computeFacets(filter, facetLimit);
|
||||
|
||||
SearchResponse response = hasQuery(request)
|
||||
? searchHybrid(request, filter, page, size)
|
||||
: searchStructuredOnly(request, filter, page, size);
|
||||
response.setFacets(facets);
|
||||
return response;
|
||||
}
|
||||
|
||||
private SearchResponse searchStructuredOnly(SearchRequest request,
|
||||
TedStructuredSearchFilter filter,
|
||||
int page,
|
||||
int size) {
|
||||
long total = repository.countDistinctDocuments(filter);
|
||||
List<TedStructuredSearchSummaryRow> rows = repository.searchStructured(filter, page, size, request.getSortBy(), request.getSortDirection());
|
||||
return SearchResponse.builder()
|
||||
.documents(rows.stream().map(this::toSummary).toList())
|
||||
.page(page)
|
||||
.size(size)
|
||||
.totalElements(total)
|
||||
.totalPages((int) Math.ceil(total / (double) size))
|
||||
.hasNext((page + 1L) * size < total)
|
||||
.hasPrevious(page > 0)
|
||||
.build();
|
||||
}
|
||||
|
||||
private SearchResponse searchHybrid(SearchRequest request,
|
||||
TedStructuredSearchFilter filter,
|
||||
int page,
|
||||
int size) {
|
||||
List<UUID> candidateIds = repository.findCandidateDocumentIds(filter, tedProjectionProperties.getStructuredSearchHybridCandidateLimit());
|
||||
if (candidateIds.isEmpty()) {
|
||||
return SearchResponse.builder()
|
||||
.documents(List.of())
|
||||
.page(page)
|
||||
.size(size)
|
||||
.totalElements(0)
|
||||
.totalPages(0)
|
||||
.hasNext(false)
|
||||
.hasPrevious(page > 0)
|
||||
.build();
|
||||
}
|
||||
|
||||
at.procon.dip.search.dto.SearchRequest genericRequest = at.procon.dip.search.dto.SearchRequest.builder()
|
||||
.queryText(request.getSemanticQuery())
|
||||
.modes(Set.of(SearchMode.HYBRID))
|
||||
.page(page)
|
||||
.size(size)
|
||||
.sortMode(resolveSortMode(request.getSortBy(), request.getSortDirection()))
|
||||
.semanticSimilarityThreshold(request.getSimilarityThreshold())
|
||||
.build();
|
||||
|
||||
var genericResponse = searchOrchestrator.search(
|
||||
genericRequest,
|
||||
new SearchDocumentScope(Set.of(), null, null, null, null, Set.copyOf(candidateIds))
|
||||
);
|
||||
|
||||
var documents = repository.search(request, page, size);
|
||||
long totalElements = repository.count(request);
|
||||
int totalPages = totalElements == 0 ? 0 : (int) Math.ceil((double) totalElements / size);
|
||||
List<UUID> orderedIds = genericResponse.getHits().stream().map(hit -> hit.getDocumentId()).toList();
|
||||
Map<UUID, TedStructuredSearchSummaryRow> summaryById = repository.findSummariesByDocumentIds(orderedIds).stream()
|
||||
.collect(Collectors.toMap(TedStructuredSearchSummaryRow::documentId, row -> row, (a, b) -> a, LinkedHashMap::new));
|
||||
|
||||
List<DocumentSummary> docs = genericResponse.getHits().stream()
|
||||
.map(hit -> {
|
||||
TedStructuredSearchSummaryRow row = summaryById.get(hit.getDocumentId());
|
||||
if (row == null) {
|
||||
return null;
|
||||
}
|
||||
DocumentSummary summary = toSummary(row);
|
||||
summary.setSimilarity(hit.getFinalScore());
|
||||
return summary;
|
||||
})
|
||||
.filter(java.util.Objects::nonNull)
|
||||
.toList();
|
||||
|
||||
return SearchResponse.builder()
|
||||
.documents(documents)
|
||||
.documents(docs)
|
||||
.page(page)
|
||||
.size(size)
|
||||
.totalElements(totalElements)
|
||||
.totalPages(totalPages)
|
||||
.hasNext(page < totalPages - 1)
|
||||
.totalElements(genericResponse.getTotalHits())
|
||||
.totalPages((int) Math.ceil(genericResponse.getTotalHits() / (double) size))
|
||||
.hasNext((page + 1L) * size < genericResponse.getTotalHits())
|
||||
.hasPrevious(page > 0)
|
||||
.build();
|
||||
}
|
||||
|
||||
private boolean hasQuery(SearchRequest request) {
|
||||
return StringUtils.hasText(request.getSemanticQuery());
|
||||
}
|
||||
|
||||
private TedStructuredSearchFilter toFilter(SearchRequest request) {
|
||||
return TedStructuredSearchFilter.builder()
|
||||
.countryCode(request.getCountryCode())
|
||||
.countryCodes(request.getCountryCodes())
|
||||
.noticeType(request.getNoticeType())
|
||||
.contractNature(request.getContractNature())
|
||||
.procedureType(request.getProcedureType())
|
||||
.cpvPrefix(request.getCpvPrefix())
|
||||
.cpvCodes(request.getCpvCodes())
|
||||
.nutsCode(request.getNutsCode())
|
||||
.nutsCodes(request.getNutsCodes())
|
||||
.publicationDateFrom(request.getPublicationDateFrom())
|
||||
.publicationDateTo(request.getPublicationDateTo())
|
||||
.submissionDeadlineAfter(request.getSubmissionDeadlineAfter())
|
||||
.euFunded(request.getEuFunded())
|
||||
.buyerNameContains(request.getBuyerNameContains())
|
||||
.projectTitleContains(request.getProjectTitleContains())
|
||||
.build();
|
||||
}
|
||||
|
||||
private SearchSortMode resolveSortMode(String sortBy, String sortDirection) {
|
||||
if ("projectTitle".equalsIgnoreCase(sortBy) && "asc".equalsIgnoreCase(sortDirection)) {
|
||||
return SearchSortMode.TITLE_ASC;
|
||||
}
|
||||
if ("publicationDate".equalsIgnoreCase(sortBy) || "submissionDeadline".equalsIgnoreCase(sortBy)) {
|
||||
return SearchSortMode.CREATED_AT_DESC;
|
||||
}
|
||||
return SearchSortMode.SCORE_DESC;
|
||||
}
|
||||
|
||||
private DocumentSummary toSummary(TedStructuredSearchSummaryRow row) {
|
||||
return DocumentSummary.builder()
|
||||
.id(row.documentId())
|
||||
.publicationId(row.publicationId())
|
||||
.noticeId(row.noticeId())
|
||||
.noticeType(row.noticeType())
|
||||
.projectTitle(row.projectTitle())
|
||||
.buyerName(row.buyerName())
|
||||
.buyerCountryCode(row.buyerCountryCode())
|
||||
.buyerCity(row.buyerCity())
|
||||
.contractNature(row.contractNature())
|
||||
.procedureType(row.procedureType())
|
||||
.publicationDate(row.publicationDate())
|
||||
.submissionDeadline(row.submissionDeadline())
|
||||
.cpvCodes(row.cpvCodes())
|
||||
.totalLots(row.totalLots())
|
||||
.estimatedValue(row.estimatedValue())
|
||||
.estimatedValueCurrency(row.estimatedValueCurrency())
|
||||
.build();
|
||||
}
|
||||
}
|
||||
|
||||
@ -1,133 +1,61 @@
|
||||
package at.procon.dip.domain.ted.search.integration;
|
||||
|
||||
import at.procon.dip.domain.access.DocumentVisibility;
|
||||
import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.get;
|
||||
import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.jsonPath;
|
||||
import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.status;
|
||||
|
||||
import at.procon.dip.domain.document.DocumentFamily;
|
||||
import at.procon.dip.domain.document.DocumentStatus;
|
||||
import at.procon.dip.domain.document.DocumentType;
|
||||
import at.procon.dip.domain.document.entity.Document;
|
||||
import at.procon.dip.domain.document.RepresentationType;
|
||||
import at.procon.dip.domain.ted.entity.TedNoticeProjection;
|
||||
import at.procon.dip.testsupport.AbstractTedStructuredSearchIntegrationTest;
|
||||
import at.procon.ted.model.entity.ContractNature;
|
||||
import at.procon.ted.model.entity.NoticeType;
|
||||
import at.procon.ted.model.entity.ProcedureType;
|
||||
import com.fasterxml.jackson.databind.ObjectMapper;
|
||||
import java.math.BigDecimal;
|
||||
import java.time.LocalDate;
|
||||
import java.time.OffsetDateTime;
|
||||
import java.util.UUID;
|
||||
import org.junit.jupiter.api.Test;
|
||||
import org.springframework.beans.factory.annotation.Autowired;
|
||||
import org.springframework.http.MediaType;
|
||||
import org.springframework.test.web.servlet.MockMvc;
|
||||
|
||||
import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.get;
|
||||
import static org.springframework.test.web.servlet.request.MockMvcRequestBuilders.post;
|
||||
import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.jsonPath;
|
||||
import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.status;
|
||||
|
||||
class TedStructuredSearchEndpointIntegrationTest extends AbstractTedStructuredSearchIntegrationTest {
|
||||
|
||||
@Autowired
|
||||
private MockMvc mockMvc;
|
||||
|
||||
@Autowired
|
||||
private ObjectMapper objectMapper;
|
||||
|
||||
@Test
|
||||
void getSearch_should_filter_and_sort_ted_projection_results() throws Exception {
|
||||
createProjection(UUID.randomUUID(), "00786665-2025", "AUT", NoticeType.CONTRACT_NOTICE,
|
||||
ContractNature.SUPPLIES, ProcedureType.OPEN, "City of Vienna", "Medical gloves framework",
|
||||
LocalDate.of(2025, 1, 15), OffsetDateTime.parse("2025-02-15T12:00:00Z"), new String[]{"33140000"}, new String[]{"AT130"}, true);
|
||||
createProjection(UUID.randomUUID(), "00786666-2025", "DEU", NoticeType.CONTRACT_NOTICE,
|
||||
ContractNature.SERVICES, ProcedureType.RESTRICTED, "Berlin Utilities", "Heating maintenance",
|
||||
LocalDate.of(2025, 1, 10), OffsetDateTime.parse("2025-02-10T12:00:00Z"), new String[]{"50720000"}, new String[]{"DE300"}, false);
|
||||
void getSearch_should_return_structured_results_and_facets() throws Exception {
|
||||
var created = dataFactory.createDocumentWithPrimaryRepresentation(
|
||||
"Medical imaging systems for Vienna hospital",
|
||||
"Procurement summary",
|
||||
"Imaging systems and maintenance.",
|
||||
DocumentType.TED_NOTICE,
|
||||
DocumentFamily.PROCUREMENT,
|
||||
"en",
|
||||
RepresentationType.SEMANTIC_TEXT
|
||||
);
|
||||
|
||||
tedNoticeProjectionRepository.save(TedNoticeProjection.builder()
|
||||
.document(created.document())
|
||||
.publicationId("100000-2026")
|
||||
.noticeId("notice-100000-2026")
|
||||
.noticeType(NoticeType.CONTRACT_NOTICE)
|
||||
.buyerName("Vienna General Hospital")
|
||||
.buyerCountryCode("AUT")
|
||||
.buyerCity("Vienna")
|
||||
.projectTitle("Medical imaging systems")
|
||||
.contractNature(ContractNature.SUPPLIES)
|
||||
.procedureType(ProcedureType.OPEN)
|
||||
.publicationDate(LocalDate.of(2026, 4, 10))
|
||||
.submissionDeadline(OffsetDateTime.parse("2026-05-01T10:00:00+02:00"))
|
||||
.cpvCodes(new String[]{"33110000", "33120000"})
|
||||
.totalLots(2)
|
||||
.euFunded(true)
|
||||
.build());
|
||||
|
||||
mockMvc.perform(get("/v1/documents/search")
|
||||
mockMvc.perform(get("/api/v1/documents/search")
|
||||
.param("countryCode", "AUT")
|
||||
.param("noticeType", "CONTRACT_NOTICE")
|
||||
.param("buyerNameContains", "vienna")
|
||||
.param("sortBy", "publicationDate")
|
||||
.param("sortDirection", "desc"))
|
||||
.param("includeFacets", "true"))
|
||||
.andExpect(status().isOk())
|
||||
.andExpect(jsonPath("$.documents.length()").value(1))
|
||||
.andExpect(jsonPath("$.documents[0].publicationId").value("00786665-2025"))
|
||||
.andExpect(jsonPath("$.documents[0].buyerName").value("City of Vienna"));
|
||||
}
|
||||
|
||||
@Test
|
||||
void postSearch_should_support_cpv_and_nuts_filters() throws Exception {
|
||||
createProjection(UUID.randomUUID(), "00786665-2025", "AUT", NoticeType.CONTRACT_NOTICE,
|
||||
ContractNature.SUPPLIES, ProcedureType.OPEN, "City of Vienna", "Medical gloves framework",
|
||||
LocalDate.of(2025, 1, 15), OffsetDateTime.parse("2025-02-15T12:00:00Z"), new String[]{"33140000", "33141000"}, new String[]{"AT130"}, true);
|
||||
createProjection(UUID.randomUUID(), "00786666-2025", "AUT", NoticeType.CONTRACT_NOTICE,
|
||||
ContractNature.SUPPLIES, ProcedureType.OPEN, "City of Graz", "Office supplies",
|
||||
LocalDate.of(2025, 1, 16), OffsetDateTime.parse("2025-02-16T12:00:00Z"), new String[]{"30192000"}, new String[]{"AT221"}, true);
|
||||
|
||||
String body = """
|
||||
{
|
||||
"cpvPrefix": "3314",
|
||||
"nutsCode": "AT130",
|
||||
"page": 0,
|
||||
"size": 10
|
||||
}
|
||||
""";
|
||||
|
||||
mockMvc.perform(post("/v1/documents/search")
|
||||
.contentType(MediaType.APPLICATION_JSON)
|
||||
.content(body))
|
||||
.andExpect(status().isOk())
|
||||
.andExpect(jsonPath("$.documents.length()").value(1))
|
||||
.andExpect(jsonPath("$.documents[0].publicationId").value("00786665-2025"));
|
||||
}
|
||||
|
||||
private void createProjection(UUID legacyId,
|
||||
String publicationId,
|
||||
String countryCode,
|
||||
NoticeType noticeType,
|
||||
ContractNature contractNature,
|
||||
ProcedureType procedureType,
|
||||
String buyerName,
|
||||
String projectTitle,
|
||||
LocalDate publicationDate,
|
||||
OffsetDateTime submissionDeadline,
|
||||
String[] cpvCodes,
|
||||
String[] nutsCodes,
|
||||
boolean euFunded) {
|
||||
Document document = documentRepository.save(Document.builder()
|
||||
.visibility(DocumentVisibility.PUBLIC)
|
||||
.documentType(DocumentType.TED_NOTICE)
|
||||
.documentFamily(DocumentFamily.PROCUREMENT)
|
||||
.status(DocumentStatus.RECEIVED)
|
||||
.title(projectTitle)
|
||||
.summary(projectTitle)
|
||||
.languageCode("en")
|
||||
.mimeType("application/xml")
|
||||
.businessKey(publicationId)
|
||||
.dedupHash(publicationId)
|
||||
.build());
|
||||
|
||||
projectionRepository.save(TedNoticeProjection.builder()
|
||||
.document(document)
|
||||
.legacyProcurementDocumentId(legacyId)
|
||||
.publicationId(publicationId)
|
||||
.noticeId("NOTICE-" + publicationId)
|
||||
.noticeType(noticeType)
|
||||
.contractNature(contractNature)
|
||||
.procedureType(procedureType)
|
||||
.buyerCountryCode(countryCode)
|
||||
.buyerName(buyerName)
|
||||
.buyerCity("Vienna")
|
||||
.buyerNutsCode(nutsCodes != null && nutsCodes.length > 0 ? nutsCodes[0] : null)
|
||||
.projectTitle(projectTitle)
|
||||
.projectDescription(projectTitle + " description")
|
||||
.publicationDate(publicationDate)
|
||||
.submissionDeadline(submissionDeadline)
|
||||
.cpvCodes(cpvCodes)
|
||||
.nutsCodes(nutsCodes)
|
||||
.totalLots(1)
|
||||
.estimatedValue(new BigDecimal("1000.00"))
|
||||
.estimatedValueCurrency("EUR")
|
||||
.euFunded(euFunded)
|
||||
.build());
|
||||
.andExpect(jsonPath("$.documents[0].publicationId").value("100000-2026"))
|
||||
.andExpect(jsonPath("$.documents[0].buyerName").value("Vienna General Hospital"))
|
||||
.andExpect(jsonPath("$.facets.countries[0].key").value("AUT"))
|
||||
.andExpect(jsonPath("$.facets.noticeTypes[0].key").value("CONTRACT_NOTICE"));
|
||||
}
|
||||
}
|
||||
|
||||
Loading…
Reference in New Issue