extract text from XML

From: Chris Pacejo <chris(at)pacejo(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: extract text from XML
Date: 2016-08-08 05:30:39
Message-ID: 1470634239.713742.688783561.46752730@webmail.messagingengine.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, I have found a basic use case which is supported by the xml2 module,
but is unsupported by the new XML API.

It is not possible to correctly extract text (either from text nodes or
attribute values) which contains the characters '<', '&', or '>'.
xpath() (correctly) returns XML text nodes for queries targeting these
node types, and there is no inverse to xmlelement(). For example:

=> select (xpath('/a/text()', xmlelement(name a, '<&>')))[1]::text;
xpath
-----------
&lt;&amp;&gt;
(1 row)

Again, not a bug; but there is no way to specify my desired intent. The
xml2 module does provide such a function, xpath_string:

=> select xpath_string(xmlelement(name a, '<&>')::text, '/a/text()');
xpath_string
--------------
<&>
(1 row)

One workaround is to return the node's text value by serializing the XML
value, and textually replacing those three entities with the characters
they represent, but this relies on the xpath() function not generating
other entities.

(My use case is importing data in XML format, and processing with
Postgres into a relational format.)

Perhaps a function xpath_value(text, xml) -> text[] would close the gap?
(I did search and no such function seems to exist currently, outside
xml2.)

Thanks,
Chris

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-08-08 05:40:47 Re: Declarative partitioning
Previous Message Alvaro Herrera 2016-08-08 04:53:58 Re: No longer possible to query catalogs for index capabilities?