Re: patch: function xmltable

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: function xmltable
Date: 2016-09-06 20:13:15
Message-ID: CAFj8pRB5NAOgzQPPZD0uXX3LBG6uE9N+PKyf0uf9NJ5VHiBs-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2016-09-06 6:54 GMT+02:00 Craig Ringer <craig(at)2ndquadrant(dot)com>:

> On 4 September 2016 at 16:06, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
> > Hi
> >
> > minor update - using DefElem instead own private parser type
>
> I'm really glad that you're doing this and I'll take a look at it for this
> CF.
>
> It's quite a big patch so I expect this will take a few rounds of
> review and updating.
>

Thank you for review

>
>
> Patch applies cleanly and builds cleanly on master both with and
> without --with-xml .
>
> Overall, I think this needs to be revised with appropriate comments.
> Whitespace/formatting needs fixing since it's all over the place.
> Documentation is insufficient (per notes below).
>

I am not able to write documentation in English language :( - This function
is pretty complex - so I hope so anybody with better language skills can
help with this. It respects standard and it respects little bit different
Oracle's behave too (different order of DEFAULT and PATH parts).

>
> Re identifier naming, some of this code uses XmlTable naming patterns,
> some uses TableExpr prefixes. Is that intended to indicate a bounary
> between things re-usable for other structured data ingesting
> functions? Do you expect a "JSONEXPR" or similar in future? That's
> alluded to by
>

This structure should be reused by JSON_TABLE function. Now, it is little
bit strange, because there is only XMLTABLE implementation - and I have to
choose between a) using two different names now, b) renaming some part in
future.

And although XMLTABLE and JSON_TABLE functions are pretty similar - share
90% of data (input value, path, columns definitions), these functions has
different syntax - so only middle level code should be shared.

>
> +/*----------
> + * TableExpr - used for XMLTABLE function
> + *
> + * This can be used for json_table, jsonb_table functions in future
> + *----------
> + */
> +typedef struct TableExpr
> +{
> ...
>
> If so, should this really be two patches, one to add the table
> expression infrastructure and another to add XMLTABLE that uses it?
> Also, why in that case does so much of the TableExpr code call
> directly into XmlTable code? It doesn't look very generic.
>

Currently the common part is not too big - just the Node related part - I
am not sure about necessity of two patches. I am agree, there is missing
some TableExpBuilder, where can be better isolated the XML part.

>
> Overall I find identifier naming to be a bit inconsisent and think
> it's necessary to make it clear that all the "TableExpr" stuff is for
> XMLTABLE specifically, if that's the case, or make the delineation
> clearer if not.
>
> I'd also like to see tests that exercise the ruleutils get_rule_expr
> parts of the code for the various XMLTABLE variants.
>
> Similarly, since this seems to add a new xpath parser, that needs
> comprehensive tests. Maybe re-purpose an existing xpath test data set?
>
>
sure

>
>
>
> More detailed comments:
> ====
>
> Docs comments:
>
> The <function>xmltable</function> produces [a] table based on
> [the] passed XML value.
>
> The docs are pretty minimal and don't explain the various clauses of
> XMLTABLE. What is "BY REF" ? Is PATH an xpath expression? If so, is
> there a good cross reference link available? The PASSING clause? etc.
>
> How does XMLTABLE decide what to iterate over, and how to iterate over it?
>
> Presumably the FOR ORDINALITY clause makes a column emit a numeric counter.
>
> What standard, if any, does this conform to? Does it resemble
> implementations elsewhere? What limitations or unsupported features
> does it have relative to those standards?
>
>
>
> execEvalTableExpr seems to be defined twice, with a difference in
> case. This is probably not going to fly:
>
>
> +static Datum
> +execEvalTableExpr(TableExprState *tstate,
> + ExprContext *econtext,
> + bool *isNull, ExprDoneCond *isDone)
> +{
>
> +static Datum
> +ExecEvalTableExpr(TableExprState *tstate,
> + ExprContext *econtext,
> + bool *isNull, ExprDoneCond *isDone)
> +{
>
>
> It looks like you've split the function into a "guts" and "wrapper"
> part, with the error handling PG_TRY / PG_CATCH block in the wrapper.
> That seems reasonable for readability, but the naming isn't.
>

I invite any idea how these functions should be named.

>
> A comment is needed to explain what ExecEvalTableExpr is / does. If
> it's XMLTABLE specific (which it looks like based on the code), its
> name should reflect that. This pattern is repeated elsewhere; e.g.
> TableExprState is really the state for an XMLTABLE expression. But
> PostgreSQL actually has TABLE statements, and in future we might want
> to support table-expressions, so I don't think this naming is
> appropriate. This is made worse by the lack of comments on things like
> the definition of TableExprState. Please use something that makes it
> clear it's for XMLTABLE and add appropriate comments.
>

I understand, so using TableExpr can be strange (for XMLTABLE function).
But when we will have JSON_TABLE function, then it will have a sense.

"TableExprState" is consistent with "TableExpr".

Any idea how it should be changed?

>
> Formatting of variables, arguments, function signatures etc is
> random/haphazard and doesn't follow project convention. It's neither
> aligned or unaligned in the normal way, I don't understand the random
> spacing at all. Maybe you should try to run pgindent and then extract
> just the changes related to your patch? Or run your IDE/editor's
> indent function on your changes? Right now it's actually kind of hard
> to read. Do you edit with tabstop set to 1 normally or something like
> that?
>
> There's a general lack of comments throughout the added code.
>
> In execEvalTableExpr, why are we looping over namespaces? What's that
> for? Comment would be nice.
>
> Typo: Path caclulation => Path calculation
>
> What does XmlTableSetRowPath() do? It seems to copy its argument.
> Nothing further is done with the row_path argument after it's called
> by execEvalTableExpr, so what context is that memory in and do we have
> to worry about it if it's large?
>
> execEvalTableExpr says it's doing "path calculation". What it actually
> appears to do is evaluate the path expressions, if provided, and
> otherwise use the column name as the implied path expression. (The
> docs should mention that).
>
> It's wasn't immediately obvious to me what the branch around
> tstate->for_ordinality_col is for and what the alternate path's
> purpose is in terms of XMLTABLE's behaviour, until I read the parser
> definition. That's largely because the behaviour of XMLTABLE is
> underspecified in the docs, since once you know ORDINALITY columns
> exist it's pretty obvious what it's doing.
>
> Similarly, for the alternate branch tstate->ncols , the
> XmlTableGetRowValue call there is meant to do what exactly, and
> why/under what conditions? Is it for situations where the field type
> is a whole-row value? a composite type? (I'm deliberately not studying
> this too deeply, these are points I'd like to see commented so it can
> be understood to some reasonable degree at a skim-read).
>
>
> /* result is one more columns every time */
> "one or more"
>
>
>
> /* when typmod is not valid, refresh it */
> if (te->typmod == -1)
>
>
> Is this a cache? How is it valid or not valid and when? The comment
> (thanks!) on TableExprGetTupleDesc says:
>
> /*
> * When we skip transform stage (in view), then TableExpr's
> * TupleDesc should not be valid. Refresh is necessary.
> */
>
> but I'm not really grasping what you're trying to explain here. What
> transform stage? What view? This could well be my ignorance of this
> part of the code; if it should be understandable by a reader who is
> appropriately familiar with the executor that's fine, but if it's
> specific to how XMLTABLE works some more explanation would be good.
>

This is most difficult part of this patch, and I am not sure it it is fully
correctly implemented. I use TupleDesc cache. The TupleDesc is created in
parser/transform stage. When the XMLTABLE is used in some view, then the
transformed parser tree is materialized - and when the view is used in
query, then this tree is loaded and the parser/transform stage is
"skipped". I'll check this code against implementation of ROW constructor
and I'll try to do more comments there.

>
> Good that you've got all the required node copy/in/out funcs in place.
>
> Please don't use the name "used_dns". Anyone reading that will read it
> as "domain name service" and that's actually confusing with XML
> because of XML schema lookups. Maybe used_defnamespace ? used
> def_ns?
>

good idea

>
> I haven't looked closely at keyword/parser changes yet, but it doesn't
> look like you added any reserved keywords, which is good. It does add
> unreserved keywords PATH and COLUMNS ; I'm not sure what policy for
> unreserved keywords is or the significance of that.
>
> New ereport() calls specify ERRCODEs, which is good.
>
> PostgreSQL already has XPATH support in the form of xmlexists(...)
> etc. Why is getXPathToken() etc needed? What re-use is possible here?
> There's no explanation in the patch header or comments. Should the new
> xpath parser be re-used by the existing xpath stuff? Why can't we use
> libxml's facilities? etc. This at least needs explaining in the
> submission, and some kind of hint as to why we have two different ways
> to do it is needed in the code. If we do need a new XML parser, should
> it be bundled in adt/xml.c along with a lot of user-facing
> functionality, or a separate file?
>
>
libxml2 and our XPATH function doesn't support default namespace (
http://plasmasturm.org/log/259/ ). This is pretty useful feature - so I
implemented. This is the mayor issue of libxml2 library. Another difference
between XPATH function and XMLTABLE function is using two phase searching
and implicit prefix "./" and suffix ("/text()") in XMLTABLE. XMLTABLE using
two XPATH expressions - for row data cutting and next for column data
cutting (from row data). The our XPATH functions is pretty simple mapped to
libxml2 XPATH API. But it is not possible with XMLTABLE function - due
design of this function in standard (it is more user friendly and doesn't
require exactly correct xpath expressions).

I didn't find any API in libxml2 for a work with parsed xpath expressions -
I need some info about the first and last token of xpath expression - it is
base for decision about using prefix or suffix.

This functionality (xpath expression parser) cannot be used for our XPATH
function now - maybe default namespace in future.

>
> How does XmlTableGetValue(...) and XmlTableGetRowValue(...) relate to
> this? It doesn't look like they're intended to be called directly by
> the user, and they're not documented (or commented).
>

Probably I used wrong names. XMLTABLE function is running in two different
modes - with explicitly defined columns (XmlTableGetValue is used), and
without explicitly defined columns - so result is one XML column and only
one one step searching is used (there are not column related xpath
expressions) ( XmlTableGetRowValue is used). The function XmlTableGetValue
is used for getting one column value, the function XmlTableGetRowValue is
used for getting one value too, but in special case, when there are not any
other value.

>
> I don't understand this at all:
>
>
>
> +/*
> + * There are different requests from XMLTABLE, JSON_TABLE functions
> + * on passed data than has CREATE TABLE command. It is reason for
> + * introduction special structure instead using ColumnDef.
> + */
> +typedef struct TableExprRawCol
> +{
> + NodeTag type;
> + char *colname;
> + TypeName *typeName;
> + bool for_ordinality;
> + bool is_not_null;
> + Node *path_expr;
> + Node *default_expr;
> + int location;
> +} TableExprRawCol;
>
>
>
I am sorry. It is my fault. Now we have very similar node ColumnDef. This
node is designed for usage in utility commands - and it is not designed for
usage inside a query. I had to decide between enhancing ColumnDef node or
introduction new special node. Because there are more special attributes
and it is hard to serialize current ColumnDef, I decided to use new node.

>
> That's my first-pass commentary. I'll return to this once you've had a
> chance to take a look at these and tell me all the places I got it
> wrong ;)
>
>
Thank for this

Regard

Pavel

>
>
> --
> Craig Ringer http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-06 20:19:30 Re: [PATCH] Alter or rename enum value
Previous Message Tom Lane 2016-09-06 20:12:51 Re: [PATCH] COPY vs \copy HINT