Re: patch: function xmltable

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-12 05:48:35
Message-ID: CAFj8pRA=BjeNe5+Ad9WgHZ4tW=5vVvLb2qb8Og54+ye1n++E8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2016-09-12 6:28 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> > I'll take a closer read-through shortly.
>
>
> Missing file. You omitted executor/tableexpr.h from the patch, so I
> can't compile.
>
> I've expanded and copy-edited the docs. Some of it is guesswork based
> on the references you sent and a glance at the code. Please check my
> changes carefully. I found a few surprises, like the fact that DEFAULT
> isn't a normal literal, it's an xpath expression evaluated at the same
> time as the rowexpression.
>
> Updated patch attached as XMLTABLE-v3 includes the docs changes. Note
> that it's missing tableexpr.h. For convenient review or to apply to
> your working tree I also attach a diff of just my docs changes as
> proposed-docs-changes.diff.
>
>
> Docs:
>
> - Can you send the sample data used to generate the example output?
> I'd like to include at least a cut down part of it in the docs to make
> it clear how the input correlates with output, and preferably put the
> whole thing in an appendix.
>

it is in regress tests.

>
> - How does it decide what subset of the document to iterate over?
> That's presumably rowexpr, which is xpath in postgresql? (I added this
> to docs).
>
> - xmlnamespaces clause in docs needs an example for a non-default
> namespace.
>
> - What effect does xmlnamespaces clause have? Does supplying it allow
> you to reference qualified names in xpath? What happens if you don't
> specify it for a document that has namespaces or don't define all the
> namespaces? What if you reference an undefined namespace in xpath?
> What about if an undefined namespace isn't referenced by xpath, but is
> inside a node selected by an xpath expression?
>

All this is under libxml2 control - when you use undefined namespace, then
libxml2 raises a error. The namespaces in document and in XPath queries are
absolutely independent - the relation is a URI. When you use bad URI
(referenced by name), then the result will be empty set. When you use
undefined name, then you will get a error.

>
> - What are the rules for converting the matched XML node into a value?
> If the matched node is not a simple text node or lacks a text node as
> its single child, what happens?
>

This process is described and controlled by "XML SQL mapping". The Postgres
has minimalistic implementation without possibility of external control and
without schema support. The my implementation is simple. When user doesn't
specify result target like explicit using of text() function, then the
text() function is used implicitly when target type is not XML. Then I dump
result to string and I enforce related input functions for target types.

>
> - What happens if the matched has multiple text node children? This
> can happen if, for example, you have something like
>
> <matchedNode>
> some text <!-- comment splits up text node --> other text
> </matchedNode>
>

depends on target type - it is allowed in XML, and it is disallowed for
other types. I though about support of a arrays - but the patch will be
much more complex - there can be recursion - so I disallowed it. When the
user have to solve this issue, then he can use nested XMLTABLE functions
and nested function is working with XML type.

Just for record - This issue is solved in JSON_TABLE functions - it allows
nested PATHS. But XMLTABLE doesn't allow it.

>
> - Is there a way to get an attribute as a value? If so, an example
> should show this because it's going to be a common need. Presumably
> you want node/@attrname ?
>

you can use reference to current node "." - so "./@attname" should to work
- a example is in regress tests

>
> - What happens if COLUMNS is not specified at all? It looks like it
> returns a single column result set with the matched entries as 'xml'
> type, so added to docs, please verify.
>

sure, that is it

>
>
> - PASSING clause isn't really defined. You can specify one PASSING
> entry as a literal/colref/expression, and it's the argument xml
> document, right? The external docs you referred to say that PASSING
> may have a BY VALUE keyword, alias its argument with AS, and may have
> expressions, e.g.
>
> PASSING BY VALUE '<x/>' AS a, '<y/>' AS b
>
> Neither aliases nor multiple entries are supported by the code or
> grammar. Should this be documented as a restriction? Do you know if
> that's an extension by the other implementation or if it's SQL/XML
> standard? (I've drafted a docs update to cover this in the updated
> patch).
>

The ANSI allows to pass more documents - and then do complex queries with
XQuery. Passing more than one document has not sense in libxml2 based
implementation, so I didn't supported it. The referenced names can be
implemented later - but it needs to changes in XPATH function too.

>
>
> - What does BY REF mean? Should this just be mentioned with a "see
> xmlexists(...)" since it seems to be compatibility noise? Is there a
> corresponding BY VALUE or similar?
>

When the XML document is stored as serialized DOM, then by ref means link
on this DOM. It has not sense in Postgres - because we store XML documents
by value only.

>
>
> - The parser definitions re-use xmlexists_argument . I don't mind
> that, but it's worth noting here in case others do.
>

It is one clause - see SQL/XML doc PASSING <XML table argument passing
mechanism>

>
> - Why do the expression arguments take c_expr (anything allowed in
> a_expr or b_expr), not b_expr (restricted expression) ?
>

I don't know - I expect the problems with parser - because PASSING is
restricted keyword in ANSI/SQL and unreserved keyword in Postgres.

>
>
> - Column definitions are underdocumented. The grammar says they can be
> NOT NULL, for example, but I don't see that in any of the references
> you mailed me nor in the docs. What behaviour is expected for a NOT
> NULL column? I've documented my best guess (not checked closely
> against the code, please verify).
>
>
yes - some other databases allows it - I am thinking so it is useful.

> -
>
>
>
>
> Test suggestions:
>
> - Coverage of multiple text() node children of an element, where split
> up by comment or similar
>
> - Coverage of xpath that matches a node with child element nodes
>

I'll do it.

>
>
>
> More to come. Please review my docs changes in the mean time. I'm
> spending a lot more time on this than I expected so I might have to
> get onto other things for a while too.
>
>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-09-12 05:52:55 Re: patch: function xmltable
Previous Message Pavel Stehule 2016-09-12 05:07:18 Re: patch: function xmltable