Re: proposal - Default namespaces for XPath expressions (PostgreSQL 11)

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal - Default namespaces for XPath expressions (PostgreSQL 11)
Date: 2017-11-09 09:11:20
Message-ID: CAFj8pRDvD7yxaPtWxmc12UnpWrh8Q7v_iOQ6wMHncwZe3-T4Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2017-11-06 14:00 GMT+01:00 Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>:

> Thank you for the new patch.
>
> - The latest patch is missing xpath_parser.h at least since
> ns-3. That of the first (not-numbered) version was still
> usable.
>
> - c29c578 conflicts on doc/src/sgml/func.sgml
>
>
> At Sun, 15 Oct 2017 12:06:11 +0200, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote in <CAFj8pRCYBH+a6oJoEYUFDUpBQ1ySwtt2CfnFZxs2A
> b9EfONbUQ(at)mail(dot)gmail(dot)com>
> > 2017-10-02 12:22 GMT+02:00 Kyotaro HORIGUCHI <
> > horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>:
> >
> > > Hi, thanks for the new patch.
> > >
> > > # The patch is missing xpath_parser.h. That of the first patch was
> usable.
> > >
> > > At Thu, 28 Sep 2017 07:59:41 +0200, Pavel Stehule <
> pavel(dot)stehule(at)gmail(dot)com>
> > > wrote in <CAFj8pRBMQa07a=+qQAVMtz5M_hqkJBhiQSOP76+-BrFDj37pvg@
> > > mail.gmail.com>
> > > > Hi
> > > >
> > > > now xpath and xpath_exists supports default namespace too
> > >
> > > At Wed, 27 Sep 2017 22:41:52 +0200, Pavel Stehule <
> pavel(dot)stehule(at)gmail(dot)com>
> > > wrote in <CAFj8pRCZ8oneG7g2vxs9ux71n8A9twwUO7zQpJiuz+7RGSpSuw(at)mail(dot)
> > > gmail.com>
> > > > > 1. Uniformity among simliar features
> > > > >
> > > > > As mentioned in the proposal, but it is lack of uniformity that
> > > > > the xpath transformer is applied only to xmltable and not for
> > > > > other xpath related functions.
> > > > >
> > > >
> > > > I have to fix the XPath function. The SQL/XML function Xmlexists
> doesn't
> > > > support namespaces/
> > >
> > > Sorry, I forgot to care about that. (And the definition of
> > > namespace array is of course fabricated by me). I'd like to leave
> > > this to committers. Anyway it is working but the syntax (or
> > > whether it is acceptable) is still arguable.
> > >
> > > SELECT xpath('/a/text()', '<my:a xmlns:my="http://example.com">
> > > test</my:a>',
> > > ARRAY[ARRAY['', 'http://example.com']]);
> > > | xpath
> > > | --------
> > > | {test}
> > > | (1 row)
> > >
> > >
> > > The internal name is properly rejected, but the current internal
> > > name (pgdefnamespace.pgsqlxml.internal) seems a bit too long. We
> > > are preserving some short names and reject them as
> > > user-defined. Doesn't just 'pgsqlxml' work?
> > >
> > >
> > > Default namespace correctly become to be applied on bare
> > > attribute names.
> > >
> > > > updated doc,
> > > > fixed all variants of expected result test file
> > >
> > > Sorry for one by one comment but I found another misbehavior.
> > >
> > > create table t1 (id int, doc xml);
> > > insert into t1
> > > values
> > > (5, '<rows xmlns="http://x.y"><row><a hoge="haha">50</a></row></
> > > rows>');
> > > select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> > > '/x:rows/x:row' passing t1.doc columns data int PATH
> > > 'child::x:a[1][attribute::hoge="haha"]') as x;
> > > | data
> > > | ------
> > > | 50
> > >
> > > but the following fails.
> > >
> > > select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> > > '/rows/row' passing t1.doc columns data int PATH
> > > 'child::a[1][attribute::hoge="haha"]') as x;
> > > | data
> > > | ------
> > > |
> > > | (1 row)
> > >
> > > Perhaps child::a is not prefixed by the transformation.
> > >
> >
> > the problem was in unwanted attribute modification. The parser didn't
> > detect "attribute::hoge" as attribute. Updated parser does it. I reduce
> > duplicated code there more.
>
> It worked as expected. But the comparison of "attribute" is
> missing t1.length = 9 so the following expression wrongly passes.
>
> child::a[1][attributeabcdefg::hoge="haha"
>
> It is confusing that is_qual_name becomes true when t2 is not a
> "qual name", and the way it treats a double-colon is hard to
> understand.
>
> It essentially does inserting the default namespace before
> unqualified non-attribute name. I believe we can easily
> look-ahead to detect a double colon and it would make things
> simpler. Could you consider something like the attached patch?
> (applies on top of ns-4 patch.)
>
> > > XPath might be complex enough so that it's worth switching to
> > > yacc/lex based transformer that is formally verifiable and won't
> > > need a bunch of cryptic tests that finally cannot prove the
> > > completeness. synchronous_standy_names is far simpler than XPath
> > > but using yacc/lex parser.
> > >
> > >
> > > Anyway the following is nitpicking of the current xpath_parser.c.
> > >
> > > - NODENAME_FIRSTCHAR allows '-' as the first char but it is
> > > excluded from NameStartChar (https://www.w3.org/TR/REC-
> > > xml/#NT-NameStartChar)
> > > I think characters with high-bit set is okay.
> > > Also IS_NODENAME_CHAR should be changed.
> > >
> >
> > fixed
> >
> >
> > > - NODENAME_FIRSTCHAR and IS_NODENAME_CHAR is in the same category
> > > but have different naming schemes. Can these are named in the same
> way?
> > >
> >
> > fixed
> >
> >
> > > - The current transoformer seems to using up to one token stack
> > > depth. Maybe the stack is needless. (pushed token is always
> > > popped just after)
> > >
> >
> > fixed
>
> Thank you.
>
> I found another (and should be the last, so sorry..) functional
> defect in this. This doesn't add default namespace if the tag
> name in a predicate is 'and' or 'or'. It needs to be fixed, or
> wrote in the documentation as a restriction. (seem hard to fix
> it..)
>
> create table t1 (id int, doc xml);
> insert into t1 values (1, '<rows xmlns="http://x.y"><row><val>
> 50</val><and>60</and></row></rows>');
> select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' AS x),
> '/x:rows/x:row' passing t1.doc columns data int PATH
> 'x:val[../x:and = 60]') as x;
> data
> ------
> 50
> (1 row)
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' passing t1.doc columns data int PATH
> 'val[../and = 60]') as x;
> data
> ------
>
> (1 row)
>
>
yes - this check needs context parser. I am expecting, this case is corner
case, not too much usual, so doc based solution is enough.

>
>
> Other comments are follows.
>
> - Please add more comments. XPATH_TOKEN_NAME in _transformXPath
> in my patch has more
>
> - Debug output might be needed.
>
> # sorry now time's up. will continue tomorrow.
>

I fixed I hope almost all issues - your patch is merged with some changes.
The most significant change is a reaction to broken XPath expression. I
prefer do nothing - libxml2 raise a error.

Attached new version.

Thank you for tips, ideas, code :)

> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

Attachment Content-Type Size
xml-xpath-default-ns-5.patch text/x-patch 28.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2017-11-09 09:25:15 Re: (spelling) Ensure header of postgresql.auto.conf is consistent
Previous Message amul sul 2017-11-09 08:55:24 Re: Proposal: Improve bitmap costing for lossy pages