Обсуждение: extract text from XML

Поиск
Список
Период
Сортировка

extract text from XML

От
Chris Pacejo
Дата:
Hi, I have found a basic use case which is supported by the xml2 module,
but is unsupported by the new XML API.

It is not possible to correctly extract text (either from text nodes or
attribute values) which contains the characters '<', '&', or '>'. 
xpath() (correctly) returns XML text nodes for queries targeting these
node types, and there is no inverse to xmlelement().  For example:

=> select (xpath('/a/text()', xmlelement(name a, '<&>')))[1]::text;  xpath   
-----------<&>
(1 row)

Again, not a bug; but there is no way to specify my desired intent.  The
xml2 module does provide such a function, xpath_string:

=> select xpath_string(xmlelement(name a, '<&>')::text, '/a/text()');xpath_string 
--------------<&>
(1 row)

One workaround is to return the node's text value by serializing the XML
value, and textually replacing those three entities with the characters
they represent, but this relies on the xpath() function not generating
other entities.

(My use case is importing data in XML format, and processing with
Postgres into a relational format.)

Perhaps a function xpath_value(text, xml) -> text[] would close the gap?(I did search and no such function seems to
existcurrently, outside
 
xml2.)

Thanks,
Chris



Re: extract text from XML

От
Tobias Bussmann
Дата:
> I have found a basic use case which is supported by the xml2 module,
> but is unsupported by the new XML API.
> It is not possible to correctly extract text

Indeed. I came accross this shortcomming some months ago myself but still manage an item on my ToDo list to report it
hereas the deprecation notice at https://www.postgresql.org/docs/devel/static/xml2.html#AEN180625 asks for. Done,
thanks;) 

I did some archive-browsing on that topic. The issue (if you want to call it that way) was introduced by an patch to
ensurexpath() always returns xml, applied for 9.2 after some discussion:
https://www.postgresql.org/message-id/201106291934.23089.rsmogura%40softperience.euand is since then known:
https://www.postgresql.org/message-id/1409795403248-5817667.post%40n5.nabble.comThe new behaviour was later reported as
abug and discussed again:
https://www.postgresql.org/message-id/CAAY5AM1L83y79rtOZAUJioREO6n4%3DXAFKcGu6qO3hCZE1yJytg%40mail.gmail.com

Anyhow - (un)escaping functions to support the text<->xml conversion are often talked about but still seem only to be
foundin xml2 module. Seeing a xmltable implementing patch here recently, these functions would be another step to make
thecontrib module obsolete, finally. 

> Perhaps a function xpath_value(text, xml) -> text[] would close the gap?

such an design, resembling the xml2 behaviour, would certainly fit the need, imho.

regards
Tobias