Re: XML type and XPath

From: "Nikolay Samokhvalov" <samokhvalov(at)gmail(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: XML type and XPath
Date: 2007-01-29 22:12:20
Message-ID: e431ff4c0701291412p5b374e1fx44e74a8b64f613b8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

BTW,

Moreover, I would like xpath_string() which return

On 1/29/07, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
[...]
>
> So, while I realize that I've been arguing for a lean core recently, I
> want to propose that we add a small set of XPath support functions to
> the core. This would come down to approximately the following set
>
> xpath_boolean(query, xml)
> xpath_number(query, xml)
> xpath_string(query, xml)
> xpath_nodeset(query, xml) -- API and return type still unclear

As for the latest one, I am for xml[] as a result type, especially if
we have xpath* in contrib. This is not XQuery sequences, but at least
it allows user to see all XML fragments (and manage them somehow -- if
he wants, he would concatenate them to one value using corresponding
function).

As for #1-3 -- they are very simple things; I do not like them,
because they return only one scalar value, which is the one
encountered first. I do not think it's very useful functions at all...
Moreover, in case of xpath_string() I think it should work in the
following manner:
1. Find all nodes that correspond the expression given. In general
case it will be a set of nodes; OK, let's take only the first one, as
we do with other functions...
2. For this node retrieve all text nodes that are its descendant. It
will be an ordered set of text values.
3. Concatenate all these values and return as a single string.
I suppose, only such behaviour is in compliance with XML data model --
as an example, consider following XML fragment: '<a>most
<b>advanced</b> open source database</a>'.

So, for xpath_string() I see two issues -- 1) a lack of usability if
it returns only one (the first) value from possible sequences of
values; 2) bad conformance if it take only one text node which belongs
to the first context node.

BTW, maybe it would be useful to have several functions, with every
behaviour that can be useful.

Also, I think it'd be better not to use the word "query" speaking of
XPath, "XPath expression" is much better (to avoid confusion with XML
Query).

> We also have prospects that later on we might get fancy GIN-based
> indexing support for XPath, which might need another xpath_matches()
> function or operator of some kind.

Now I'm trying to collect all thought regarding indexes and express it
in a short message (what types of queries should be considered; what
types of indexes would support that queries).

BTW, Do not forget that some type of index is already available - it's
simply functional indexes on xpath_*() with static (i.e. known as a
constant value a priori) XPath expression.

> As far as contrib/xml2 is concerned, I'm not going to make any efforts
> to make the interface compatible because that module has a rather
> pragmatic design, whereas I'd rather just provide the raw operations
> that can be assembled easily by the user to achieve some of the things
> that contrib/xml2 does now. Once some description of transition steps
> has been developed, I'd deprecate the contrib/xml2 module and probably
> remove it after 8.3.
>
> In the wiki we have collected some random ideas of other interesting
> operations on XML types
> (http://developer.postgresql.org/index.php/XML_Todo, near the bottom).
> That list at the moment says:
>
> DTD validation
> Relax-NG
> XSLT
> XML Canonical (to compare XML values)
> Pretty-printing XML (e.g., indenting)

I've added "Shredding with annotated schemas" to this list (with brief
description why it could be needed).

Also, in a long term I see such items as
- integration/support in pl/perl and other pl-langs that can work with XML;
- work with web services (maybe it'd better to use pl/perl here).
Maybe it too early to add such things even to the bottom of Todo list :-)

--
Best regards,
Nikolay

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2007-01-29 22:20:00 Re: Modifying and solidifying contrib
Previous Message Bruce Momjian 2007-01-29 21:56:00 Re: psql possible TODO