Skip site navigation (1) Skip section navigation (2)

Re: Native XML

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Anton <antonin(dot)houska(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Native XML
Date: 2011-02-28 16:23:58
Message-ID: 4D6BCC1E.3010406@dunslane.net (view raw or flat)
Thread:
Lists: pgsql-hackers

On 02/28/2011 10:51 AM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net>  writes:
>> xpath_table is severely broken by design IMNSHO. We need a new design,
>> but I'm reluctant to work on that until someone does LATERAL, because a
>> replacement would be much nicer to design with it than without it.
> Well, maybe I'm missing something, but I don't really understand why
> xpath_table's design is so unreasonable.  Also, what would a better
> solution look like exactly?  (Feel free to assume LATERAL is available.)
>


What's unreasonable about it is that the supplied paths are independent 
of each other, and evaluated in the context of the entire XML document.

Let's take the given example in the docs, changed slightly to assume 
each piece of XML can have more than one article listing in it (i.e,. 
'article' is not the root node of the document):

    SELECT * FROM
    xpath_table('article_id',
                 'article_xml',
                 'articles',
                 '//article/author|//article/pages|//article/title',
                 'date_entered>  ''2003-01-01'' ')
    AS t(article_id integer, author text, page_count integer, title text);

There is nothing that says that the author has to come from the same 
article as the title, nor is there any way of saying that they must. If 
an article node is missing author or pages or title, or has more than 
one where its siblings do not, they will line up wrongly.

An alternative would be to supply a single xpath expression that would 
specify the context nodes to be iterated over (in this case that would 
be '//article') and a set of xpath expressions to be evaluated in the 
context of those nodes (in this case 'article|pages|title' ort better 
yet, supply these as a text array). We'd produce exactly one row for 
each node found by the context expression, and take the first value 
found by each of the column expressions in that context (or we could 
error out if we found more than one, or supply an array if the result 
field is an array). So with LATERAL taking care of the rest, the 
function signature could be something like:

    xpath_table_new(
         doc xml,
         context_xpath text,
         column_xpath text[])
    returns setof record


Given this, you could not get a row with title and author from different 
article nodes in the source document like you can now.

cheers

andrew


In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2011-02-28 16:39:17
Subject: EXPLAIN doesn't show sufficient info for wCTE cases
Previous:From: Robert HaasDate: 2011-02-28 16:23:10
Subject: Re: Native XML

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group