Skip site navigation (1) Skip section navigation (2)

Re: remove contrib/xml2

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: remove contrib/xml2
Date: 2010-02-28 22:58:42
Message-ID: 4B8AF522.9020902@dunslane.net (view raw or flat)
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> I believe I have fixed all the reported crashes in contrib/xml2.
>   

Yay! Well done! That at least removes any possibly urgency about 
removing the module.

> However there is still this issue pointed out by Robert:
>
>   
>> CREATE TABLE xpath_test (id integer NOT NULL, t xml);
>> INSERT INTO xpath_test VALUES (1, '<rowlist><row a="1"/><row a="2" b="oops"/></rowlist>');
>> SELECT * FROM xpath_table('id', 't', 'xpath_test',
>> '/rowlist/row/@a|/rowlist/row/@b', 'true') as t(id int4, a text, b text);
>>     
>
>   
>> which yields an answer that is, at least, extremely surprising, if not
>> flat-out wrong:
>>     
>
>   
>>  id | a |  b
>> ----+---+------
>>   1 | 1 | oops
>>   1 | 2 |
>> (2 rows)
>>     
>
> the point being that it seems like "oops" should be associated with "2"
> not "1".  The reason for that behavior is that xpath_table runs through
> the XPATH_NODESET results generated by the various XPaths and dumps the
> k'th one of each into the k'th output row generated for the current
> input row.  If there is any way to synchronize which node in each array
> goes with each node in each other array, it's not apparent to me, but
> I don't know libxml's API at all.  Perhaps there is some other call we
> should be using to evaluate all the XPaths in parallel?
>
> (The code is also unbelievably inefficient, recompiling each XPath
> expression once per output row (!); but it doesn't seem worth fixing
> that right away given that we might have to throw away the logic
> entirely in order to fix this bug.)
>
> 			
>   

Damn that's ugly.


ISTM the missing piece is really in our API. We need to be able to 
specify a nodeset to iterate over, and then for each node take the first 
value produced by each xpath expression. So the example above would look 
something like:

    SELECT * FROM xpath_table('id', 't', 'xpath_test',
    '/rowlist/row', '@a|@b', 'true') as t(id int4, a text, b text);


Maybe we could approximate that with the current API by factoring out 
the common root of the xpath expressions, but that's likely to be 
extremely fragile and error prone, and we've already got bad experience 
of trying to be too cute with xpath expressions.

cheers

andrew

In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2010-02-28 23:57:33
Subject: contrib/xml2 vs core xml in 8.3
Previous:From: Josh BerkusDate: 2010-02-28 22:41:46
Subject: Re: [HACKERS] full text search index scan query plan changed in 8.4.2?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group