From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Craig Ringer <craig(at)2ndquadrant(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: patch: function xmltable |
Date: | 2016-09-23 08:29:18 |
Message-ID: | CAFj8pRCjmsh11c+wqsAjSGA=1sXxmRAZqC86Tz-7Qcjg3-zcQw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2016-09-23 10:05 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:
> On 22 September 2016 at 02:31, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> > another small update - fix XMLPath parser - support multibytes characters
>
> I'm returning for another round of review.
>
> The code doesn't handle the 5 XML built-in entities correctly in
> text-typed output. It processes ' and " but not &, < or
> > . See added test. I have not fixed this, but I think it's clearly
> broken:
>
>
> + -- XML builtin entities
> + SELECT * FROM xmltable('/x/a' PASSING
> '<x><a><ent>'</ent></a><a><ent>"</ent></a><a><
> ent>&</ent></a><a><ent><</ent></a><a><ent>></ent></a></x>'
> COLUMNS ent text);
> + ent
> + -------
> + '
> + "
> + &
> + <
> + >
> + (5 rows)
>
> so I've adjusted the docs to claim that they're expanded. The code
> needs fixing to avoid entity-escaping when the output column type is
> not 'xml'.
>
>
> ' and " entities in xml-typed output are expanded, not
> preserved. I don't know if this is intended but I suspect it is:
>
> + SELECT * FROM xmltable('/x/a' PASSING
> '<x><a><ent>'</ent></a><a><ent>"</ent></a><a><
> ent>&</ent></a><a><ent><</ent></a><a><ent>></ent></a></x>'
> COLUMNS ent xml);
> + ent
> + ------------------
> + <ent>'</ent>
> + <ent>"</ent>
> + <ent>&</ent>
> + <ent><</ent>
> + <ent>></ent>
> + (5 rows)
>
>
> For the docs changes relevant to the above search for "The five
> predefined XML entities". Adjust that bit of docs if I guessed wrong
> about the intended behaviour.
>
> The tests don't cover CDATA or PCDATA . I didn't try to add that, but
> they should.
>
>
> Did some docs copy-editing and integrated some examples. Explained how
> nested elements work, that multiple top level elements is an error,
> etc. Explained the time-of-evaluation stuff. Pointed out that you can
> refer to prior output columns in PATH and DEFAULT, since that's weird
> and unusual compared to normal SQL. Documented handling of multiple
> node matches, including the surprising results of somepath/text() on
> <somepath>x<!--blah-->y</somepath>. Documented handling of nested
> elements. Documented that xmltable works only on XML documents, not
> fragments/forests.
>
> Regarding evaluation time, it struck me that evaluating path
> expressions once per row means the xpath must be parsed and processed
> once per row. Isn't it desirable to store and re-use the preparsed
> xpath? I don't think this is a major problem, since we can later
> detect stable/immutable expressions including constants, evaluate only
> once in that case, and cache. It's just worth thinking about.
>
> The docs and tests don't seem to cover XML entities. What's the
> behaviour there? Core XML only defines one entity, but if a schema
> defines more how are they processed? The tests need to cover the
> predefined entities " & ' < and > at least.
>
> I have no idea whether the current code can fetch a DTD and use any
> <!ENTITY > declarations to expand entities, but I'm guessing not? If
> not, external DTDs, and internal DTDs with external entities should be
> documented as unsupported.
>
> It doesn't seem to cope with internal DTDs at all (libxml2 limitation?):
>
> SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0"
> standalone="yes" ?>
> <!DOCTYPE foo [
> <!ELEMENT foo (#PCDATA)>
> <!ENTITY pg "PostgreSQL">
> ]>
> <foo>Hello &pg;.</foo>
> $XML$ COLUMNS foo text);
>
> + ERROR: invalid XML content
> + LINE 1: SELECT * FROM xmltable('/' PASSING $XML$<?xml version="1.0" ...
> + ^
> + DETAIL: line 2: StartTag: invalid element name
> + <!DOCTYPE foo [
> + ^
> + line 3: StartTag: invalid element name
> + <!ELEMENT foo (#PCDATA)>
> + ^
> + line 4: StartTag: invalid element name
> + <!ENTITY pg "PostgreSQL">
> + ^
> + line 6: Entity 'pg' not defined
> + <foo>Hello &pg;.</foo>
> + ^
>
>
> libxml seems to support documents with internal DTDs:
>
> $ xmllint --valid /tmp/x
> <?xml version="1.0" standalone="yes"?>
> <!DOCTYPE foo [
> <!ELEMENT foo (#PCDATA)>
> <!ENTITY pg "PostgreSQL">
> ]>
> <foo>Hello &pg;.</foo>
>
>
> so presumably the issue lies in the xpath stuff? Note that it's not
> even ignoring the DTD and choking on the undefined entity, it's
> choking on the DTD its self.
>
>
> OK, code comments:
>
>
> In +ExecEvalTableExpr, shouldn't you be using PG_ENSURE_ERROR_CLEANUP
> instead of a PG_TRY() / PG_CATCH() block?
>
>
> I think the new way you handle the type stuff is much, much better,
> and with comments to explain too. Thanks very much.
>
>
> There's an oversight in tableexpr vs xmltable separation here:
>
> + case T_TableExpr:
> + *name = "xmltable";
> + return 2;
>
> presumably you need to look at the node and decide what kind of table
> expression it is or just use a generic "tableexpr".
>
> Same problem here:
>
> + case T_TableExpr:
> + {
> + TableExpr *te = (TableExpr *) node;
> +
> + /* c_expr shoud be closed in brackets */
> + appendStringInfoString(buf, "XMLTABLE(");
>
>
This is correct, but not well commented - looks on XMLEXPR node - TableExpr
is a holder, but it is invisible for user. User running a XMLTABLE function
and should to see XMLTABLE. It will be more clean when we will support
JSON_TABLE function.
>
>
> I don't have the libxml knowledge or remaining brain to usefully
> evaluate the xpath and xml specifics in xpath.c today. It does strike
> me that the new xpath parser should probably live in its own file,
> though.
>
I'll try move it to separate file
>
> I think this is all a big improvement. Barring the notes above and my
> lack of review of the guts of the xml.c parts of it, I'm pretty happy
> with what I see now.
>
Thank you. I hope so all major issues are solved. Probably some XML
specific related issues are there - but I am happy, so you have well XML
knowledge and you will test a corner cases.
Regards
Pavel
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2016-09-23 09:08:14 | [PATCH] Remove redundant if clause in standbydesc.c |
Previous Message | Craig Ringer | 2016-09-23 08:07:24 | Re: patch: function xmltable |