You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
DIP/XPATH_EXAMPLES.md

6.9 KiB

Native XML-Spalte mit XPath-Abfragen

Implementiert

Die xml_document-Spalte ist jetzt ein nativer PostgreSQL XML-Typ mit voller XPath-Unterstützung.

Hibernate-Konfiguration

@Column(name = "xml_document", nullable = false)
@JdbcTypeCode(SqlTypes.SQLXML)
private String xmlDocument;

XPath-Abfrage Beispiele

1. Einfache XPath-Abfrage (ohne Namespaces)

-- Alle Dokument-IDs extrahieren
SELECT
    id,
    xpath('//ID/text()', xml_document) as document_ids
FROM ted.procurement_document;

2. XPath mit Namespaces (eForms UBL)

eForms verwendet XML-Namespaces. Sie müssen diese bei XPath-Abfragen angeben:

-- Titel extrahieren (mit Namespace)
SELECT
    id,
    xpath(
        '//cbc:Title/text()',
        xml_document,
        ARRAY[
            ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
            ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
        ]
    ) as titles
FROM ted.procurement_document;

3. Buyer Name extrahieren

SELECT
    id,
    publication_id,
    xpath(
        '//cac:ContractingParty/cac:Party/cac:PartyName/cbc:Name/text()',
        xml_document,
        ARRAY[
            ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
            ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
        ]
    ) as buyer_names
FROM ted.procurement_document;

4. CPV-Codes extrahieren

SELECT
    id,
    xpath(
        '//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode/text()',
        xml_document,
        ARRAY[
            ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
            ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
        ]
    ) as cpv_codes
FROM ted.procurement_document;

5. Filtern nach XML-Inhalt

-- Alle Dokumente finden, die einen bestimmten CPV-Code enthalten
SELECT
    id,
    publication_id,
    buyer_name
FROM ted.procurement_document
WHERE xpath_exists(
    '//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode[text()="45000000"]',
    xml_document,
    ARRAY[
        ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
        ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
    ]
);

6. Estimated Value extrahieren

SELECT
    id,
    xpath(
        '//cac:ProcurementProject/cac:RequestedTenderTotal/cbc:EstimatedOverallContractAmount/text()',
        xml_document,
        ARRAY[
            ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
            ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
        ]
    ) as estimated_values
FROM ted.procurement_document;

JPA/Hibernate Native Queries

Sie können XPath auch in Spring Data JPA Repositories verwenden:

Repository-Beispiel

@Repository
public interface ProcurementDocumentRepository extends JpaRepository<ProcurementDocument, UUID> {

    /**
     * Findet alle Dokumente, die einen bestimmten Text im Titel enthalten (via XPath)
     */
    @Query(value = """
        SELECT * FROM ted.procurement_document
        WHERE xpath_exists(
            '//cbc:Title[contains(text(), :searchText)]',
            xml_document,
            ARRAY[
                ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2']
            ]
        )
        """, nativeQuery = true)
    List<ProcurementDocument> findByTitleContaining(@Param("searchText") String searchText);

    /**
     * Extrahiert CPV-Codes via XPath
     */
    @Query(value = """
        SELECT
            unnest(xpath(
                '//cac:ProcurementProject/cac:MainCommodityClassification/cbc:ItemClassificationCode/text()',
                xml_document,
                ARRAY[
                    ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
                    ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2']
                ]
            ))::text as cpv_code
        FROM ted.procurement_document
        WHERE id = :documentId
        """, nativeQuery = true)
    List<String> extractCpvCodes(@Param("documentId") UUID documentId);

    /**
     * Findet Dokumente nach CPV-Code
     */
    @Query(value = """
        SELECT * FROM ted.procurement_document
        WHERE xpath_exists(
            '//cbc:ItemClassificationCode[text()=:cpvCode]',
            xml_document,
            ARRAY[
                ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2']
            ]
        )
        """, nativeQuery = true)
    List<ProcurementDocument> findByCpvCode(@Param("cpvCode") String cpvCode);
}

PostgreSQL XML-Funktionen

Weitere nützliche XML-Funktionen:

xml_is_well_formed()

SELECT xml_is_well_formed(xml_document) FROM ted.procurement_document;

xpath_exists() - Prüft ob Pfad existiert

SELECT xpath_exists('//cbc:Title', xml_document, ...) FROM ted.procurement_document;

unnest() - Array zu Zeilen

SELECT
    id,
    unnest(xpath('//cbc:Title/text()', xml_document, ...))::text as title
FROM ted.procurement_document;

Häufige eForms Namespaces

ARRAY[
    ARRAY['cbc', 'urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2'],
    ARRAY['cac', 'urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2'],
    ARRAY['ext', 'urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2'],
    ARRAY['efac', 'http://data.europa.eu/p27/eforms-ubl-extension-aggregate-components/1'],
    ARRAY['efbc', 'http://data.europa.eu/p27/eforms-ubl-extension-basic-components/1']
]

Performance-Tipps

  1. Indizierung: Für häufige XPath-Abfragen können Sie funktionale Indexe erstellen:
CREATE INDEX idx_doc_title ON ted.procurement_document
    USING GIN ((xpath('//cbc:Title/text()', xml_document, ...)));
  1. Materialized Views: Für komplexe XPath-Abfragen:
CREATE MATERIALIZED VIEW ted.document_titles AS
SELECT
    id,
    unnest(xpath('//cbc:Title/text()', xml_document, ...))::text as title
FROM ted.procurement_document;

Vorteile der nativen XML-Spalte

Native XPath-Abfragen XML-Validierung möglich Effiziente Speicherung PostgreSQL XML-Funktionen verfügbar Strukturierte Abfragen auf XML-Elementen Funktionale Indexe möglich

Hibernate funktioniert jetzt korrekt

Mit @JdbcTypeCode(SqlTypes.SQLXML) weiß Hibernate, dass es SQLXML verwenden muss für INSERT/UPDATE.

Das verhindert den Fehler:

ERROR: column "xml_document" is of type xml but expression is of type character varying