You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
242 lines
6.9 KiB
Markdown
242 lines
6.9 KiB
Markdown
# Native XML-Spalte mit XPath-Abfragen
|
|
|
|
## ✅ Implementiert
|
|
|
|
Die `xml_document`-Spalte ist jetzt ein nativer PostgreSQL XML-Typ mit voller XPath-Unterstützung.
|
|
|
|
## Hibernate-Konfiguration
|
|
|
|
```java
|
|
@Column(name = "xml_document", nullable = false)
|
|
@JdbcTypeCode(SqlTypes.SQLXML)
|
|
private String xmlDocument;
|
|
```
|
|
|
|
## XPath-Abfrage Beispiele
|
|
|
|
### 1. **Einfache XPath-Abfrage (ohne Namespaces)**
|
|
|
|
```sql
|
|
-- Alle Dokument-IDs extrahieren
|
|
SELECT
|
|
id,
|
|
xpath('//ID/text()', xml_document) as document_ids
|
|
FROM ted.procurement_document;
|
|
```
|
|
|
|
### 2. **XPath mit Namespaces (eForms UBL)**
|
|
|
|
eForms verwendet XML-Namespaces. Sie müssen diese bei XPath-Abfragen angeben:
|
|
|
|
```sql
|
|
-- Titel extrahieren (mit Namespace)
|
|
SELECT
|
|
id,
|
|
xpath(
|
|
'//cbc:Title/text()',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
|
|
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
|
|
]
|
|
) as titles
|
|
FROM ted.procurement_document;
|
|
```
|
|
|
|
### 3. **Buyer Name extrahieren**
|
|
|
|
```sql
|
|
SELECT
|
|
id,
|
|
publication_id,
|
|
xpath(
|
|
'//cac:ContractingParty/cac:Party/cac:PartyName/cbc:Name/text()',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
|
|
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
|
|
]
|
|
) as buyer_names
|
|
FROM ted.procurement_document;
|
|
```
|
|
|
|
### 4. **CPV-Codes extrahieren**
|
|
|
|
```sql
|
|
SELECT
|
|
id,
|
|
xpath(
|
|
'//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode/text()',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
|
|
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
|
|
]
|
|
) as cpv_codes
|
|
FROM ted.procurement_document;
|
|
```
|
|
|
|
### 5. **Filtern nach XML-Inhalt**
|
|
|
|
```sql
|
|
-- Alle Dokumente finden, die einen bestimmten CPV-Code enthalten
|
|
SELECT
|
|
id,
|
|
publication_id,
|
|
buyer_name
|
|
FROM ted.procurement_document
|
|
WHERE xpath_exists(
|
|
'//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode[text()="45000000"]',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
|
|
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
|
|
]
|
|
);
|
|
```
|
|
|
|
### 6. **Estimated Value extrahieren**
|
|
|
|
```sql
|
|
SELECT
|
|
id,
|
|
xpath(
|
|
'//cac:ProcurementProject/cac:RequestedTenderTotal/cbc:EstimatedOverallContractAmount/text()',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
|
|
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
|
|
]
|
|
) as estimated_values
|
|
FROM ted.procurement_document;
|
|
```
|
|
|
|
## JPA/Hibernate Native Queries
|
|
|
|
Sie können XPath auch in Spring Data JPA Repositories verwenden:
|
|
|
|
### Repository-Beispiel
|
|
|
|
```java
|
|
@Repository
|
|
public interface ProcurementDocumentRepository extends JpaRepository<ProcurementDocument, UUID> {
|
|
|
|
/**
|
|
* Findet alle Dokumente, die einen bestimmten Text im Titel enthalten (via XPath)
|
|
*/
|
|
@Query(value = """
|
|
SELECT * FROM ted.procurement_document
|
|
WHERE xpath_exists(
|
|
'//cbc:Title[contains(text(), :searchText)]',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2']
|
|
]
|
|
)
|
|
""", nativeQuery = true)
|
|
List<ProcurementDocument> findByTitleContaining(@Param("searchText") String searchText);
|
|
|
|
/**
|
|
* Extrahiert CPV-Codes via XPath
|
|
*/
|
|
@Query(value = """
|
|
SELECT
|
|
unnest(xpath(
|
|
'//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode/text()',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
|
|
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
|
|
]
|
|
))::text as cpv_code
|
|
FROM ted.procurement_document
|
|
WHERE id = :documentId
|
|
""", nativeQuery = true)
|
|
List<String> extractCpvCodes(@Param("documentId") UUID documentId);
|
|
|
|
/**
|
|
* Findet Dokumente nach CPV-Code
|
|
*/
|
|
@Query(value = """
|
|
SELECT * FROM ted.procurement_document
|
|
WHERE xpath_exists(
|
|
'//cbc:ItemClassificationCode[text()=:cpvCode]',
|
|
xml_document,
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2']
|
|
]
|
|
)
|
|
""", nativeQuery = true)
|
|
List<ProcurementDocument> findByCpvCode(@Param("cpvCode") String cpvCode);
|
|
}
|
|
```
|
|
|
|
## PostgreSQL XML-Funktionen
|
|
|
|
Weitere nützliche XML-Funktionen:
|
|
|
|
### `xml_is_well_formed()`
|
|
```sql
|
|
SELECT xml_is_well_formed(xml_document) FROM ted.procurement_document;
|
|
```
|
|
|
|
### `xpath_exists()` - Prüft ob Pfad existiert
|
|
```sql
|
|
SELECT xpath_exists('//cbc:Title', xml_document, ...) FROM ted.procurement_document;
|
|
```
|
|
|
|
### `unnest()` - Array zu Zeilen
|
|
```sql
|
|
SELECT
|
|
id,
|
|
unnest(xpath('//cbc:Title/text()', xml_document, ...))::text as title
|
|
FROM ted.procurement_document;
|
|
```
|
|
|
|
## Häufige eForms Namespaces
|
|
|
|
```sql
|
|
ARRAY[
|
|
ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
|
|
ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2'],
|
|
ARRAY['ext', 'urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2'],
|
|
ARRAY['efac', 'http://data.europa.eu/p27/eforms-ubl-extension-aggregate-components/1'],
|
|
ARRAY['efbc', 'http://data.europa.eu/p27/eforms-ubl-extension-basic-components/1']
|
|
]
|
|
```
|
|
|
|
## Performance-Tipps
|
|
|
|
1. **Indizierung**: Für häufige XPath-Abfragen können Sie funktionale Indexe erstellen:
|
|
```sql
|
|
CREATE INDEX idx_doc_title ON ted.procurement_document
|
|
USING GIN ((xpath('//cbc:Title/text()', xml_document, ...)));
|
|
```
|
|
|
|
2. **Materialized Views**: Für komplexe XPath-Abfragen:
|
|
```sql
|
|
CREATE MATERIALIZED VIEW ted.document_titles AS
|
|
SELECT
|
|
id,
|
|
unnest(xpath('//cbc:Title/text()', xml_document, ...))::text as title
|
|
FROM ted.procurement_document;
|
|
```
|
|
|
|
## Vorteile der nativen XML-Spalte
|
|
|
|
✅ Native XPath-Abfragen
|
|
✅ XML-Validierung möglich
|
|
✅ Effiziente Speicherung
|
|
✅ PostgreSQL XML-Funktionen verfügbar
|
|
✅ Strukturierte Abfragen auf XML-Elementen
|
|
✅ Funktionale Indexe möglich
|
|
|
|
## Hibernate funktioniert jetzt korrekt
|
|
|
|
Mit `@JdbcTypeCode(SqlTypes.SQLXML)` weiß Hibernate, dass es `SQLXML` verwenden muss für INSERT/UPDATE.
|
|
|
|
Das verhindert den Fehler:
|
|
```
|
|
ERROR: column "xml_document" is of type xml but expression is of type character varying
|
|
```
|