Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] proposal - Default namespaces for XPath expressions (PostgreSQL 11)
Date: 2018-01-24 09:30:39
Message-ID: CAFj8pRBVUVvG1CXxgrs0UipTziUX6M788z-=L9gQvwAB4UGLeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi

2018-01-23 8:13 GMT+01:00 Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp
>:

> Hello, I returned to this.
>
> I thouroughly checked the translator's behavior against the XPath
> specifications and checkd out the documentation and regression
> test. Almost everything is fine for me and this would be the last
> comment from me.
>
> At Fri, 24 Nov 2017 18:32:43 +0100, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote in <CAFj8pRB7Fs_2DrtUTGhTmQb+KReXPH6SG62hGWO3KVL_eZYCaA@
> mail.gmail.com>
> > 2017-11-24 18:13 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> >
> > >
> > >
> > > 2017-11-24 17:53 GMT+01:00 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> > >
> > >> Hi
> > >>
> > >> 2017-11-22 22:49 GMT+01:00 Thomas Munro <
> thomas(dot)munro(at)enterprisedb(dot)com>:
> > >>
> > >>> On Thu, Nov 9, 2017 at 10:11 PM, Pavel Stehule <
> pavel(dot)stehule(at)gmail(dot)com>
> > >>> wrote:
> > >>> > Attached new version.
> > >>>
> > >>> Hi Pavel,
> > >>>
> > >>> FYI my patch testing robot says[1]:
> > >>>
> > >>> xml ... FAILED
> > >>>
> > >>> regression.diffs says:
> > >>>
> > >>> + SELECT x.* FROM t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> > >>> '/rows/row' PASSING t1.doc COLUMNS data int PATH
> > >>> 'child::a[1][attribute::hoge="haha"]') as x;
> > >>> + data
> > >>> + ------
> > >>> + (0 rows)
> > >>> +
> > >>>
> > >>> Maybe you forgot to git-add the expected file?
> > >>>
> > >>> [1] https://travis-ci.org/postgresql-cfbot/postgresql/
> builds/305979133
> > >>
> > >>
> > >> unfortunately xml.out has 3 versions and is possible so one version
> > >> should be taken elsewhere than my comp.
> > >>
> > >> please can me send your result xml.out file?
> > >>
> > >
> > > looks like this case is without xml support so I can fix on my comp.
> > >
> > >
> > fixed regress test
>
> (I wouldn't have found that..)
>
> I have three comments on the behavior and one on documentation.
>
> 1. Lack of syntax handling.
>
> ["'" [^'] "'"] is also a string literal, but getXPathToken is
> forgetting that and applying default namespace mistakenly to the
> astring content.
>

In this case, I am not sure if I understand well.

Within expressions, literal strings are delimited by single or double
quotation marks, which are also used to delimit XML attributes. To avoid a
quotation mark in an expression being interpreted by the XML processor as
terminating the attribute value the quotation mark can be entered as a
character reference (&quot; or &apos;). Alternatively, the expression can
use single quotation marks if the XML attribute is delimited with double
quotation marks or vice-versa.

So if I understand well, then XML string can looks like ' some " some ' or
" some ' some ". I fixed it.

> 2. Additional comment might be good.
>
> It might be better having additional description about default
> namespace in the comment starts from "Namespace mappings are
> passed as text[]" in xpth_internal().
>

fixed

> 3. Inconsistent behavior from named namespace.
>
> | - function context, aliases are <emphasis>local</emphasis>).
> | + function context, aliases are <emphasis>local</emphasis>). Default
> namespace has
> | + empty name (empty string) and should be only one.
>
> This works as the description, on the other hand the same
> namespace prefix can be defined twice or more in the array and
> the last one is in effect. I don't see a reason for
> differenciating the default namespace case.
>

It looks like libxml2 bug. There is no sense to use more than one default
namespace, and although it is inconsistent with other namespaces, I am
thinking so it is correct. Is better to raise error early. In this case
Postgres expects so libxml2 ensure all namespace checks and it is tolerant.
Default namespace is implemented inside Postgres, and I don't see any
advantage of tolerant behave. More default namespaces is disallowed for
XMLTABLE - so some inconsistency there should be. In this case I prefer
raise error to signalize ambiguous or badly formatted input clearly.

>
>
> 4. Comments on the documentation part.
>
> # Even though I'm not sutable for commenting on wording...
>
> | + Inside predicate literals <literal>and</literal>,
> <literal>or</literal>,
> | + <literal>div</literal> and <literal>mod</literal> are used as
> keywords
> | + (XPath operators) every time and default namespace are not applied
> there.
>
> *I*'d like to have a comma between the predicate and literals,
> and have a 'a' before prediate. Or 'Literals .. inside a
> predicate' might be better?
>

fixed

>
> 'are used as keywords' might be better being 'are identifed as
> keywords'?
>

fixed

> Default namespace is applied to tag names except the listed
> keywords even inside a predicate. So 'are not applied there'
> might be better being 'are not applied to them'? Or 'are not
> applied in the case'?
>
> | + If you would to use these literals like tag names, then the
> default namespace
> | + should not be used, and these literals should be explicitly
> | + labeled.
> | + </para>
>
> Default namespace is not applied *only to* such keywords inside a
> predicate. Even if an Xpath expression contains such a tag name,
> default namespace still works for other tags. Does the following
> make sense?
>
> + Use named namespace to qualify such tag names appear in an
> + XPath predicate.
>
>
fixed

I hope so some native speaker finalize doc. It is out of my knowledges
.

>
>
> ===
> After the aboves are addressed (even or rejected), I think I
> don't have no additional comment.
>
>
> - This current patch applies safely (with small shifts) on the
> current master.
>
> - The code looks fine for me.
>
> - This patch translates the given XPath expression by prefixing
> unprefixed tag names with a special namespace prefix only in
> the case where default namespace is defined, so the existing
> behavior is not affected.
>
> - The syntax is existing but just not implemented so I don't
> think no arguemnts needed here.
>
> - It undocumentedly inhibits the usage of the namespace prefix
> "pgdefnamespace.pgsqlxml.internal" but I believe no one can
> notice that.
>
> - The default-ns translator (xpath_parser.c) seems working
> perfectly with some harmless exceptions.
>
> (xpath specifications is here: https://www.w3.org/TR/1999/
> REC-xpath-19991116/)
>
> Related unused features (and not documented?):
> context variables ($n notations),
> user-defined functions (or function names prefixed by a namespace
> prefix)

I did some fast check - and these features are not supported by libxml2 -
so it is question if it should be parsed by out xpath parser. So there are
not possible to check it, test it :(

>
> Newly documented behavior:
> the default namespace isn't applied to and/or/div/mod.
>
> - Dodumentation looks enough.
>
> - Regression test doesn't cover the XPath syntax but it's not
> viable. I am fine with the basic test cases added by the
> current patch.
>
> regards,
>

I am sending updated version.

Very much thanks for very precious review

Pavel

> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

Attachment Content-Type Size
xml-xpath-default-ns-7.patch text/x-patch 31.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2018-01-24 09:33:10 Re: Re: [HACKERS] pgbench randomness initialization
Previous Message KAWAMICHI Ryoji 2018-01-24 09:17:02 Re: proposal: alternative psql commands quit and exit