Re: patch: function xmltable

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-27 01:34:47
Message-ID: CAMsr+YE0aY+x5qDeQ8SPv0DBza_aATRyNWO19Gbdmvua3CxdrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24 September 2016 at 14:01, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

>> Did some docs copy-editing and integrated some examples. Explained how
>> nested elements work, that multiple top level elements is an error,
>> etc. Explained the time-of-evaluation stuff. Pointed out that you can
>> refer to prior output columns in PATH and DEFAULT, since that's weird
>> and unusual compared to normal SQL. Documented handling of multiple
>> node matches, including the surprising results of somepath/text() on
>> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
>> elements. Documented that xmltable works only on XML documents, not
>> fragments/forests.
>
>
> I don't understand to this sentence: "It is possible for a PATH expression
> to reference output columns that appear before it in the column-list, so
> paths may be dynamically constructed based on other parts of the XML
> document:"

>> The docs and tests don't seem to cover XML entities. What's the
>> behaviour there? Core XML only defines one entity, but if a schema
>> defines more how are they processed? The tests need to cover the
>> predefined entities &quot; &amp; &apos; &lt; and &gt; at least.
>
>
> I don't understand, what you are propose here. ?? Please, can you send some
> examples.

Per below - handling of DTD <!ENTITY> declarations, and the builtin
entity tests I already added tests for.

>> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>>
>> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
>> standalone="yes" ?>
>> <!DOCTYPE foo [
>> <!ELEMENT foo (#PCDATA)>
>> <!ENTITY pg "PostgreSQL">
>> ]>
>> <foo>Hello &pg;.</foo>
>> $XML$ COLUMNS foo text);
>>
>> + ERROR: invalid XML content
>> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
>> + ^
>> + DETAIL: line 2: StartTag: invalid element name
>> + <!DOCTYPE foo [
>> + ^
>> + line 3: StartTag: invalid element name
>> + <!ELEMENT foo (#PCDATA)>
>> + ^
>> + line 4: StartTag: invalid element name
>> + <!ENTITY pg "PostgreSQL">
>> + ^
>> + line 6: Entity 'pg' not defined
>> + <foo>Hello &pg;.</foo>
>> + ^
>>
>
> It is rejected before XMLTABLE function call
>
> postgres=# select $XML$<?xml version="1.0" standalone="yes" ?>
> postgres$# <!DOCTYPE foo [
> postgres$# <!ELEMENT foo (#PCDATA)>
> postgres$# <!ENTITY pg "PostgreSQL">
> postgres$# ]>
> postgres$# <foo>Hello &pg;.</foo>
> postgres$# $XML$::xml;
> ERROR: invalid XML content
> LINE 1: select $XML$<?xml version="1.0" standalone="yes" ?>
> ^
> DETAIL: line 2: StartTag: invalid element name
> <!DOCTYPE foo [
[snip]

> It is disabled by default in libxml2. I found a function
> xmlSubstituteEntitiesDefault http://www.xmlsoft.org/entities.html
> http://www.xmlsoft.org/html/libxml-parser.html#xmlSubstituteEntitiesDefault
>
> The default behave should be common for all PostgreSQL's libxml2 based
> function - and then it is different topic - maybe part for PostgreSQL ToDo?
> But I don't remember any user requests related to this issue.

OK, so it's not xmltable specific. Fine by me.

Somebody who cares can deal with it. There's clearly nobody breaking
down the walls wanting the feature.

> I removed this tests - it is not related to XMLTABLE function, but to
> generic XML processing/validation.

Good plan.

>> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
>> instead of a PG_TRY() / PG_CATCH() block?
>
>
> If I understand to doc, the PG_ENSURE_ERROR_CLEANUP should be used, when you
> want to catch FATAL errors (and when you want to clean shared memory).
> XMLTABLE doesn't use shared memory, and doesn't need to catch fatal errors.

Ok, makes sense.

>> I don't have the libxml knowledge or remaining brain to usefully
>> evaluate the xpath and xml specifics in xpath.c today. It does strike
>> me that the new xpath parser should probably live in its own file,
>> though.
>
> moved

Thanks.

> new version is attached

Great.

I'm marking this ready for committer at this point.

I think the XML parser likely needs a more close reading, so I'll ping
Peter E to see if he'll have a chance to check that bit out. But by
and large I think the issues have been ironed out - in terms of
functionality, structure and clarity I think it's looking solid.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-09-27 01:41:53 Re: pgbench - allow to store select results into variables
Previous Message Craig Ringer 2016-09-27 01:23:21 Re: [PATCH] Transaction traceability - txid_status(bigint)