> On 2018-06-21, at 22:27 , Alvaro Herrera wrote: > > First one is about array indexes not working sanely (I couldn't get this > to work in Oracle) > >>> Also, array indexes behave funny. First let's add more XML comments >>> inside that number, and query for the subscripts: >>> >>> update xmldata set data = regexp_replace(data::text, '791', '791')::xml; >>> >>> SELECT xmltable.* >>> FROM (SELECT data FROM xmldata) x, >>> LATERAL XMLTABLE('/ROWS/ROW' >>> PASSING data >>> COLUMNS >>> country_name text PATH 'COUNTRY_NAME/text()' NOT NULL, >>> size_text float PATH 'SIZE/text()', >>> size_text_1 float PATH 'SIZE/text()[1]', >>> size_text_2 float PATH 'SIZE/text()[2]', >>> "SIZE" float, size_xml xml PATH 'SIZE') >>> where size_text is not null; >>> >>> country_name │ size_text │ size_text_1 │ size_text_2 │ size_text_3 │ SIZE │ size_xml >>> ──────────────┼───────────┼─────────────┼─────────────┼─────────────┼──────┼─────────────────────────────────────────────────────── >>> Singapore │ 791 │ 791 │ 91 │ 1 │ 791 │ 791 >>> (1 fila) The concatenation was caused by the use of xmlNodeListGetString, which I have completely eliminated because it was even wrong for element nodes. As per my understanding (and how Oracle and DB2 do it), text nodes contained in nested elements must also be included in the string representation. This is an incompatible change that needs adoption of an regression test: SELECT * FROM xmltable('/root' passing 'a1aa2a bbbbxxxcccc' COLUMNS element text); element -------------------- a1aa2a bbbbcccc This should actually include the ‘xxx’ from the nested element as well. As I could not find a libxml function that does this, I’ve introduced a new one (xml_xmlnodetostring) in the vein of libxml’s xmlNodeListGetString. > The second one is about (lack of!) processing instructions and comments: > >> Also, node() matching comments or processing instructions >> seems to be broken too: >> >> SELECT * >> FROM (VALUES (''::xml) >> , (''::xml) >> ) d(x) >> CROSS JOIN LATERAL >> XMLTABLE('/xml' >> PASSING x >> COLUMNS "node()" TEXT PATH 'node()' >> ) t >> >> x | node() >> ---------------------------+-------- >> | >> | >> (2 rows) >> This was easy to fix: PI’s and comments contain their data in their own content, not in a child. > The third issue is the way we output comments when they're in a column > of type XML: > >>> Note what happens if I change the type from text to xml in that >>> column: >>> >>> SELECT * >>> FROM (VALUES ('text'::xml) >>> , (' weird ]]>'::xml) >>> ) d(x) >>> CROSS JOIN LATERAL >>> XMLTABLE('/xml' >>> PASSING x >>> COLUMNS "node()" xml PATH 'node()' >>> ) t; >>> >>> x │ node() >>> ───────────────────────────────────────────────────────────┼──────────────────────────────────────────────── >>> text │ te ahoy xt >>> weird ]]> │ some <!-- really --> weird <stuff> >>> (2 filas) I included comments, CDATA sections and PIs into the list of nodes that need to use libxml's xmlNodeDump. The 0001 patch also removes the distinction between the case of "count == 1 && typid == XMLOID” and "count > 1" because I don’t see any reason for two different code paths there (except that one string copy is saved in the count == 1 case). The 0001 patch also adds regression tests for all three issues. There is a second patch, for which I don’t have strong opinions, that eliminates the use of xmlStrdup in two cases. As far as I can see, the xmlStrdup is only there so the xmlFree at the end can be done unconditionally. I found it disturbing to do a strdup for the sole purpose it can be free-ed again. However, the patch is not a beauty, just don’t apply if there is any doubt. -markus