Re: proposal - Default namespaces for XPath expressions (PostgreSQL 11)

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal - Default namespaces for XPath expressions (PostgreSQL 11)
Date: 2017-09-27 20:41:52
Message-ID: CAFj8pRCZ8oneG7g2vxs9ux71n8A9twwUO7zQpJiuz+7RGSpSuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2017-09-25 13:25 GMT+02:00 Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>:

> Hello, this patch have been ignored for a long time since its proposal...
>
> At Sat, 11 Mar 2017 20:44:31 +0100, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote in <CAFj8pRB+WDyDcZyGmfRdJ0HOoXugeaL-KNFeK9YA5Z10JN9qfA(at)mail(dot)gmail(dot)
> com>
> > Hi
> >
> > This proposal is followup of implementation of XMLTABLE.
> >
> > Lot of XML documents has assigned document namespace.
> >
> > <rows xmlns="http://x.y"><row><a>10</a></row></rows>
> >
> > For these XML document any search path must use schema "http://x.y".
> This
> > is not too intuitive, and from XMLTABLE usage is not too user friendly,
> > because the default column path (same like column name) cannot be used. A
> > solution of this issue is default namespace - defined in SQL/XML.
> >
> > example - related to previous xml
> >
> > without default namespace:
> > XMLTABLE(NAMESPACES('http://x.y' AS aux),
> > '/aux:rows/aux:row' PASSING ...
> > COLUMNS a int PATH 'aux:a')
> >
> > with default namespace
> > XMLTABLE(NAMESPACES(DEFAULT 'http://x.y'),
> > '/rows/row' PASSING ...
> > COLUMNS a int);
> >
> >
> > Unfortunately the libxml2 doesn't support default namespaces in XPath
> > expressions. Because the libxml2 functionality is frozen, there is not
> big
> > chance for support in near future. A implementation is not too hard -
> > although it requires simple XPath expressions state translator.
> >
> > The databases with XMLTABLE implementation supports default namespace for
> > XPath expressions.
> >
> > The patch for initial implementation is attached.
>
> The original message is a bit less informative for those who
> wants to review this but are not accustomed (like me) to this
> area. I try to augment this with a bit more information. (Perhaps)
>
> An example of this issue can be as follows.
>
> create table t1 (id int, doc xml);
> insert into t1
> values
> (1, '<rows xmlns="http://x.y"><row><a>10</a></row></rows>'),
> (2, '<rows xmlns="http://x.y"><row><a>20</a></row></rows>'),
> (3, '<rows xmlns="http://x.y"><row><a>30</a></row></rows>'),
> (4, '<rows xmlns="http://x.y"><row><a>40</a></row></rows>');
> select x.* from t1, xmltable('/rows/row' passing t1.doc columns data int
> PATH 'a') as x;
> | data
> | ------
> | (0 rows)
> select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' as n),
> '/n:rows/n:row' passing t1.doc columns data int PATH 'n:a') as x;
> | data
> | ------
> | 10
> | 20
> | 30
> | 40
> | (4 rows)
>
> But, currently the follwing command fails with error.
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' passing t1.doc columns data int PATH 'a') as x;
> | ERROR: DEFAULT namespace is not supported
>
> This patch let PostgreSQL allow this syntax by transforming xpath
> string when DEFAULT namespace is defined.
>

> =======================
> I have some review comments.
>
> This patch still applies with shifts and works as expected.
>
> 1. Uniformity among simliar features
>
> As mentioned in the proposal, but it is lack of uniformity that
> the xpath transformer is applied only to xmltable and not for
> other xpath related functions.
>

I have to fix the XPath function. The SQL/XML function Xmlexists doesn't
support namespaces/

>
>
> 2. XML comformance
>
> I'm not yet sure this works for the all extent of the
> available syntax but at least fails for the following
> expression.
>
> (delete from t1;)
> insert into t1
> values
> (5, '<rows xmlns="http://x.y"><row><a hoge="haha">50</a></row></
> rows>');
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.y'),
> '/rows/row' passing t1.doc columns data int PATH 'a[1][(at)hoge]') as x;
> > data
> > ------
> >
> > (1 row)
>
>
> The following expression works.
>
> select x.* from t1, xmltable(XMLNAMESPACES('http://x.y' as x),
> '/x:rows/x:row' passing t1.doc columns data int PATH 'x:a[1][(at)hoge]') as x;
> > data
> > ------
> > 50
> > (1 row)
>
> The w3c says as follows.
>
> https://www.w3.org/TR/xml-names/#defaulting
> > The namespace name for an unprefixed attribute name always has no
> value.
>
> We currently don't have a means to make sure that this works
> correctly for the whole extent. More regression test helps?
>

I fixed this issue and I used your examples as regression tests

>
>
> 3. The predefined prefix for default namespace
>
> The patch defines the name of the defaut namespace as
> "pgdefnamespace". If a default namespace is defined, a
> namespace is made with the name and with_default_ns is true. If
> a namespace with the name is defined, a namespace is made also
> with the same name but with_default_ns is false. This causes a
> confused behavior.
>
> select x.* from t1, xmltable(XMLNAMESPACES(DEFAULT 'http://x.x', '
> http://x.y' as pgdefnamespace), '/rows/row' passing t1.doc columns data
> int PATH 'a') as x;
> | data
> | ------
> | 10
> | 20
> | 30
> | 40
> | (4 rows)
>
> The reason for the design is the fact that xmlXPathRegisterNs
> doesn't accept NULL or empty string as a namespace prefix and it
> only accepts a string consists of valid XML caharacters.
>
> Even if we are to live with such restriction and such name is
> hardly used, a namespace prefix with the name should be
> rejected.
>
>
it is fixed - look to regress test

>
> 4. A mistake in the documentaion ?
>
> The documentaion says about the XMLNAMESPACES clause as the
> folows.
>
> https://www.postgresql.org/docs/10/static/functions-xml.html
> > The optional XMLNAMESPACES clause is a comma-separated list of
> > namespaces. It specifies the XML namespaces used in the document
> > and their aliases.
>
> As far as looking into XmlTableSetNamespace, (and if I read the
> documentation correctly) the defined namespaces are not applied
> on documents, but expressions. This patch is not to blame for
> this. So this will be another patch backbatchable to Pg10.
>
> | The optional XMLNAMESPACES clause is a comma-separated list of
> | namespaces. It specifies the XML namespaces used in the
> | row_expression.
>
>
Regards

Pavel

>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

Attachment Content-Type Size
xml-xpath-default-ns-2.patch text/x-patch 14.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-09-27 21:45:15 Re: GSoC 2017: weekly progress reports (week 6)
Previous Message Marko Tiikkaja 2017-09-27 20:17:18 Re: 200 = 199 + 1?