You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/XPATH_EXAMPLES.md

242 lines
6.9 KiB
Markdown

# Native XML-Spalte mit XPath-Abfragen
## ✅ Implementiert
Die `xml_document`-Spalte ist jetzt ein nativer PostgreSQL XML-Typ mit voller XPath-Unterstützung.
## Hibernate-Konfiguration
```java
@Column(name = "xml_document", nullable = false)
@JdbcTypeCode(SqlTypes.SQLXML)
private String xmlDocument;
```
## XPath-Abfrage Beispiele
### 1. **Einfache XPath-Abfrage (ohne Namespaces)**
```sql
-- Alle Dokument-IDs extrahieren
SELECT
id,
xpath('//ID/text()', xml_document) as document_ids
FROM ted.procurement_document;
```
### 2. **XPath mit Namespaces (eForms UBL)**
eForms verwendet XML-Namespaces. Sie müssen diese bei XPath-Abfragen angeben:
```sql
-- Titel extrahieren (mit Namespace)
SELECT
id,
xpath(
'//cbc:Title/text()',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
]
) as titles
FROM ted.procurement_document;
```
### 3. **Buyer Name extrahieren**
```sql
SELECT
id,
publication_id,
xpath(
'//cac:ContractingParty/cac:Party/cac:PartyName/cbc:Name/text()',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
]
) as buyer_names
FROM ted.procurement_document;
```
### 4. **CPV-Codes extrahieren**
```sql
SELECT
id,
xpath(
'//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode/text()',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
]
) as cpv_codes
FROM ted.procurement_document;
```
### 5. **Filtern nach XML-Inhalt**
```sql
-- Alle Dokumente finden, die einen bestimmten CPV-Code enthalten
SELECT
id,
publication_id,
buyer_name
FROM ted.procurement_document
WHERE xpath_exists(
'//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode[text()="45000000"]',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
]
);
```
### 6. **Estimated Value extrahieren**
```sql
SELECT
id,
xpath(
'//cac:ProcurementProject/cac:RequestedTenderTotal/cbc:EstimatedOverallContractAmount/text()',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
]
) as estimated_values
FROM ted.procurement_document;
```
## JPA/Hibernate Native Queries
Sie können XPath auch in Spring Data JPA Repositories verwenden:
### Repository-Beispiel
```java
@Repository
public interface ProcurementDocumentRepository extends JpaRepository<ProcurementDocument, UUID> {
/**
* Findet alle Dokumente, die einen bestimmten Text im Titel enthalten (via XPath)
*/
@Query(value = """
SELECT * FROM ted.procurement_document
WHERE xpath_exists(
'//cbc:Title[contains(text(), :searchText)]',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2']
]
)
""", nativeQuery = true)
List<ProcurementDocument> findByTitleContaining(@Param("searchText") String searchText);
/**
* Extrahiert CPV-Codes via XPath
*/
@Query(value = """
SELECT
unnest(xpath(
'//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode/text()',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
]
))::text as cpv_code
FROM ted.procurement_document
WHERE id = :documentId
""", nativeQuery = true)
List<String> extractCpvCodes(@Param("documentId") UUID documentId);
/**
* Findet Dokumente nach CPV-Code
*/
@Query(value = """
SELECT * FROM ted.procurement_document
WHERE xpath_exists(
'//cbc:ItemClassificationCode[text()=:cpvCode]',
xml_document,
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2']
]
)
""", nativeQuery = true)
List<ProcurementDocument> findByCpvCode(@Param("cpvCode") String cpvCode);
}
```
## PostgreSQL XML-Funktionen
Weitere nützliche XML-Funktionen:
### `xml_is_well_formed()`
```sql
SELECT xml_is_well_formed(xml_document) FROM ted.procurement_document;
```
### `xpath_exists()` - Prüft ob Pfad existiert
```sql
SELECT xpath_exists('//cbc:Title', xml_document, ...) FROM ted.procurement_document;
```
### `unnest()` - Array zu Zeilen
```sql
SELECT
id,
unnest(xpath('//cbc:Title/text()', xml_document, ...))::text as title
FROM ted.procurement_document;
```
## Häufige eForms Namespaces
```sql
ARRAY[
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2'],
ARRAY['ext', 'urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2'],
ARRAY['efac', 'http://data.europa.eu/p27/eforms-ubl-extension-aggregate-components/1'],
ARRAY['efbc', 'http://data.europa.eu/p27/eforms-ubl-extension-basic-components/1']
]
```
## Performance-Tipps
1. **Indizierung**: Für häufige XPath-Abfragen können Sie funktionale Indexe erstellen:
```sql
CREATE INDEX idx_doc_title ON ted.procurement_document
USING GIN ((xpath('//cbc:Title/text()', xml_document, ...)));
```
2. **Materialized Views**: Für komplexe XPath-Abfragen:
```sql
CREATE MATERIALIZED VIEW ted.document_titles AS
SELECT
id,
unnest(xpath('//cbc:Title/text()', xml_document, ...))::text as title
FROM ted.procurement_document;
```
## Vorteile der nativen XML-Spalte
✅ Native XPath-Abfragen
✅ XML-Validierung möglich
✅ Effiziente Speicherung
✅ PostgreSQL XML-Funktionen verfügbar
✅ Strukturierte Abfragen auf XML-Elementen
✅ Funktionale Indexe möglich
## Hibernate funktioniert jetzt korrekt
Mit `@JdbcTypeCode(SqlTypes.SQLXML)` weiß Hibernate, dass es `SQLXML` verwenden muss für INSERT/UPDATE.
Das verhindert den Fehler:
```
ERROR: column "xml_document" is of type xml but expression is of type character varying
```