Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: pavel(dot)stehule(at)gmail(dot)com
Cc: thomas(dot)munro(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)
Date: 2018-01-29 08:50:08
Message-ID: 20180129.175008.100739543.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 24 Jan 2018 10:30:39 +0100, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote in <CAFj8pRBVUVvG1CXxgrs0UipTziUX6M788z-=L9gQvwAB4UGLeg(at)mail(dot)gmail(dot)com>
> Hi
>
> 2018-01-23 8:13 GMT+01:00 Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp
> > I have three comments on the behavior and one on documentation.
> >
> > 1. Lack of syntax handling.
> >
> > ["'" [^'] "'"] is also a string literal, but getXPathToken is
> > forgetting that and applying default namespace mistakenly to the
> > astring content.
> >
>
> In this case, I am not sure if I understand well.
>
> Within expressions, literal strings are delimited by single or double
> quotation marks, which are also used to delimit XML attributes. To avoid a
> quotation mark in an expression being interpreted by the XML processor as
> terminating the attribute value the quotation mark can be entered as a
> character reference (&quot; or &apos;). Alternatively, the expression can
> use single quotation marks if the XML attribute is delimited with double
> quotation marks or vice-versa.

I think it is correct understanding.

> So if I understand well, then XML string can looks like ' some " some ' or
> " some ' some ". I fixed it.

Thanks. It looks good.

> > 2. Additional comment might be good.
> >
> > It might be better having additional description about default
> > namespace in the comment starts from "Namespace mappings are
> > passed as text[]" in xpth_internal().
> >
>
> fixed

Thanks.

> > 3. Inconsistent behavior from named namespace.
> >
> > | - function context, aliases are <emphasis>local</emphasis>).
> > | + function context, aliases are <emphasis>local</emphasis>). Default
> > namespace has
> > | + empty name (empty string) and should be only one.
> >
> > This works as the description, on the other hand the same
> > namespace prefix can be defined twice or more in the array and
> > the last one is in effect. I don't see a reason for
> > differenciating the default namespace case.
> >
>
> It looks like libxml2 bug. There is no sense to use more than one default
> namespace, and although it is inconsistent with other namespaces, I am
> thinking so it is correct. Is better to raise error early. In this case
> Postgres expects so libxml2 ensure all namespace checks and it is tolerant.
> Default namespace is implemented inside Postgres, and I don't see any
> advantage of tolerant behave. More default namespaces is disallowed for
> XMLTABLE - so some inconsistency there should be. In this case I prefer
> raise error to signalize ambiguous or badly formatted input clearly.

Ok. I'm fine with that.

> > 4. Comments on the documentation part.
> >
> > # Even though I'm not sutable for commenting on wording...
> >
> > | + Inside predicate literals <literal>and</literal>,
> > <literal>or</literal>,
> > | + <literal>div</literal> and <literal>mod</literal> are used as
> > keywords
> > | + (XPath operators) every time and default namespace are not applied
> > there.
> >
> > *I*'d like to have a comma between the predicate and literals,
> > and have a 'a' before prediate. Or 'Literals .. inside a
> > predicate' might be better?
> >
>
> fixed
>
> >
> > 'are used as keywords' might be better being 'are identifed as
> > keywords'?
> >
>
> fixed
>
>
> > Default namespace is applied to tag names except the listed
> > keywords even inside a predicate. So 'are not applied there'
> > might be better being 'are not applied to them'? Or 'are not
> > applied in the case'?
> >
> > | + If you would to use these literals like tag names, then the
> > default namespace
> > | + should not be used, and these literals should be explicitly
> > | + labeled.
> > | + </para>
> >
> > Default namespace is not applied *only to* such keywords inside a
> > predicate. Even if an Xpath expression contains such a tag name,
> > default namespace still works for other tags. Does the following
> > make sense?
> >
> > + Use named namespace to qualify such tag names appear in an
> > + XPath predicate.
> >
> >
> fixed
>
> I hope so some native speaker finalize doc. It is out of my knowledges
> .

I am also anxious for that.

> > ===
> > After the aboves are addressed (even or rejected), I think I
> > don't have no additional comment.
> >
> >
> > - This current patch applies safely (with small shifts) on the
> > current master.
> >
> > - The code looks fine for me.
> >
> > - This patch translates the given XPath expression by prefixing
> > unprefixed tag names with a special namespace prefix only in
> > the case where default namespace is defined, so the existing
> > behavior is not affected.
> >
> > - The syntax is existing but just not implemented so I don't
> > think no arguemnts needed here.
> >
> > - It undocumentedly inhibits the usage of the namespace prefix
> > "pgdefnamespace.pgsqlxml.internal" but I believe no one can
> > notice that.
> >
> > - The default-ns translator (xpath_parser.c) seems working
> > perfectly with some harmless exceptions.
> >
> > (xpath specifications is here: https://www.w3.org/TR/1999/
> > REC-xpath-19991116/)
> >
> > Related unused features (and not documented?):
> > context variables ($n notations),
> > user-defined functions (or function names prefixed by a namespace
> > prefix)
>
>
> I did some fast check - and these features are not supported by libxml2 -
> so it is question if it should be parsed by out xpath parser. So there are
> not possible to check it, test it :(

I'm fine with that. I don't think that test for them is needed
since PostgreSQL doesn't support them anyway. Sorry for the
confusing comment. (libxml2 complains about that in the following
way.)

| =# select xpath('/a/text()=$', '<a>test</a>');
| ERROR: invalid XPath expression
!| DETAIL: Expected $ for variable reference
| CONTEXT: SQL function "xpath" statement 1

| =# select xpath('func(/a/text())', '<a>test</a>');
| ERROR: could not create XPath object
!| DETAIL: Unregistered function
| CONTEXT: SQL function "xpath" statement 1

> > Newly documented behavior:
> > the default namespace isn't applied to and/or/div/mod.
> >
> > - Dodumentation looks enough.
> >
> > - Regression test doesn't cover the XPath syntax but it's not
> > viable. I am fine with the basic test cases added by the
> > current patch.
> >
> > regards,
> >
>
> I am sending updated version.
>
> Very much thanks for very precious review

It's my pleasure. Sorry for my slow responses.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2018-01-29 08:57:36 Re: Built-in connection pooling
Previous Message Jeevan Chalke 2018-01-29 08:42:52 Re: [HACKERS] Partition-wise aggregation/grouping