Re: patch: function xmltable

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-15 11:31:43
Message-ID: CAFj8pRDgrHTAhXNXKg-XeyTXYyUOfqWTmvkLS8_6cOCYVimpEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2016-09-12 8:46 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 12 September 2016 at 13:07, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
> >> Out of interest, should the syntax allow room for future expansion to
> >> permit reading from file rather than just string literal / column
> >> reference? It'd be ideal to avoid reading big documents wholly into
> >> memory when using INSERT INTO ... SELECT XMLTABLE (...) . I don't
> >> suggest adding that to this patch, just making sure adding it later
> >> would not cause problems.
> >
> >
> > this is little bit different question - it is server side function, so
> first
> > question is - how to push usually client side content to server? Next
> > question is how to get this content to a executor. Now only COPY
> statement
> > is able to do.
>
> Probably start with support for server-side files. When people are
> dealing with really big files they'll be more willing to copy files to
> the server or bind them into the server file system over the network.
>
> The v3 protocol doesn't really allow any way for client-to-server
> streaming during a query, I think that's hopeless until we have a
> protocol bump.
>
> > updated patch attached - with your documentation.
>
>
> >> - Can you send the sample data used to generate the example output?
> >> I'd like to include at least a cut down part of it in the docs to make
> >> it clear how the input correlates with output, and preferably put the
> >> whole thing in an appendix.
> >
> >
> > it is in regress tests.
>
> Makes sense.
>
> It's not that verbose (for XML) and I wonder if it's just worth
> including it in-line in the docs along with the XMLTABLE example. It'd
> be much easier to understand how XMLTABLE works and what it does then.
>
> >> - What effect does xmlnamespaces clause have? Does supplying it allow
> >> you to reference qualified names in xpath? What happens if you don't
> >> specify it for a document that has namespaces or don't define all the
> >> namespaces? What if you reference an undefined namespace in xpath?
> >> What about if an undefined namespace isn't referenced by xpath, but is
> >> inside a node selected by an xpath expression?
> >
> >
> > All this is under libxml2 control - when you use undefined namespace,
> then
> > libxml2 raises a error. The namespaces in document and in XPath queries
> are
> > absolutely independent - the relation is a URI. When you use bad URI
> > (referenced by name), then the result will be empty set. When you use
> > undefined name, then you will get a error.
>
> OK, makes sense.
>
> >> - What are the rules for converting the matched XML node into a value?
> >> If the matched node is not a simple text node or lacks a text node as
> >> its single child, what happens?
> >
> > This process is described and controlled by "XML SQL mapping". The
> Postgres
> > has minimalistic implementation without possibility of external control
> and
> > without schema support. The my implementation is simple. When user
> doesn't
> > specify result target like explicit using of text() function, then the
> > text() function is used implicitly when target type is not XML. Then I
> dump
> > result to string and I enforce related input functions for target types.
>
> OK, so a subset of the full spec functionality is provided because of
> limitations in Pg and libxml2. Makes sense.
>
> My only big concern here is that use of text() is a common mistake in
> XSLT, and I think the same thing will happen here. Users expect
> comments to be ignored, but in fact a comment inserts a comment node
> into the XML DOM, so a comment between two pieces of text produces a
> tree of
>
> element
> text()
> comment()
> text()
>
>
> If you match element/text(), you get a 2-node result and will presumably
> ERROR.
>
> There is no good way to tell this from
>
> element
> text()
> element
> text()
>
> when you use an xpath expression like element/text() . So you can't
> safely solve it just by concatenating all resulting text() nodes
> without surprising behaviour.
>
> >> - What happens if the matched has multiple text node children? This
> >> can happen if, for example, you have something like
> >>
> >> <matchedNode>
> >> some text <!-- comment splits up text node --> other text
> >> </matchedNode>
> >
>

I fixed this case - new regress tests added

> >
> > depends on target type - it is allowed in XML, and it is disallowed for
> > other types. I though about support of a arrays - but the patch will be
> much
> > more complex - there can be recursion - so I disallowed it. When the user
> > have to solve this issue, then he can use nested XMLTABLE functions and
> > nested function is working with XML type.
>
> I don't really understand how that'd work.
>
> Do you know how other implementations handle this?
>
> I think users are going to be VERY surprised when comments in text
> break their XML.
>
> >> - Is there a way to get an attribute as a value? If so, an example
> >> should show this because it's going to be a common need. Presumably
> >> you want node/@attrname ?
> >
> >
> > you can use reference to current node "." - so "./@attname" should to
> work -
> > a example is in regress tests
>
> cool, just needs mention in docs then.
>
> >> - PASSING clause isn't really defined. You can specify one PASSING
> >> entry as a literal/colref/expression, and it's the argument xml
> >> document, right? The external docs you referred to say that PASSING
> >> may have a BY VALUE keyword, alias its argument with AS, and may have
> >> expressions, e.g.
> >>
> >> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
> >>
> >> Neither aliases nor multiple entries are supported by the code or
> >> grammar. Should this be documented as a restriction?
> >
> >
> > The ANSI allows to pass more documents - and then do complex queries with
> > XQuery. Passing more than one document has not sense in libxml2 based
> > implementation, so I didn't supported it. The referenced names can be
> > implemented later - but it needs to changes in XPATH function too.
>
> OK, so my docs addition that just says they're not supported should be
> fine.
>
> >> - What does BY REF mean? Should this just be mentioned with a "see
> >> xmlexists(...)" since it seems to be compatibility noise? Is there a
> >> corresponding BY VALUE or similar?
> >
> > When the XML document is stored as serialized DOM, then by ref means
> link on
> > this DOM. It has not sense in Postgres - because we store XML documents
> by
> > value only.
>
> Right. And since there's already precent for xmlexists there's no
> point worrying about whether we lose opportunities to implement it
> later.
>
> >> - Why do the expression arguments take c_expr (anything allowed in
> >> a_expr or b_expr), not b_expr (restricted expression) ?
> >
> >
> > I don't know - I expect the problems with parser - because PASSING is
> > restricted keyword in ANSI/SQL and unreserved keyword in Postgres.
>
> I mean for the rowpath argument, not the parts within
> xmlexists_argument. If the rowpath doesn't need to be c_expr
> presumably it should be a b_expr or even, if it doesn't cause parsing
> ambiguities, an a_expr ? There doesn't seem to be the same issue here
> as we have with BETWEEN etc.
>

b_expr enforces shift/reduce conflict :(

>
> >> - Column definitions are underdocumented. The grammar says they can be
> >> NOT NULL, for example, but I don't see that in any of the references
> >> you mailed me nor in the docs. What behaviour is expected for a NOT
> >> NULL column? I've documented my best guess (not checked closely
> >> against the code, please verify).
> >>
> >
> > yes - some other databases allows it - I am thinking so it is useful.
>
> Sure. Sounds like my docs additions are probably right then, except
> for incorrect description of DEFAULT.
>

I found other opened question - how we can translate empty tag to SQL
value? The Oracle should not to solve this question, but PostgreSQL does.
Some databases returns empty string.

I prefer return a empty string - not null in this case. The reason is
simple - Empty string is some information - and NULL is less information.
When it is necessary I can transform empty string to NULL - different
direction is not unique.

Regards

Pavel

>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

Attachment Content-Type Size
xmltable-6.patch.gz application/x-gzip 23.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2016-09-15 11:55:11 pgsql: Support OpenSSL 1.1.0.
Previous Message Rahila Syed 2016-09-15 11:29:56 Re: Surprising behaviour of \set AUTOCOMMIT ON