Re: Proposal: XML helper functions

From: Scott Bailey <artacus(at)comcast(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 19:07:28
Message-ID: 4B438DF0.8060108@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Merlin Moncure wrote:
> On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus(at)comcast(dot)net> wrote:
>> One of the problem with shredding XML is that it is very kludgy to get a
>> scalar value back from xpath. The xpath function always returns an array of
>> XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
>
> I just happen to be dealing with XML right now as well and my initial
> thought is that your suggestion doesn't buy you a whole lot: the root
> problem IMO is not dealing with what xpath gives you but that there is
> no DOMish representation of the xml document for you to query. You
> have to continually feed the entire document to xpath which is
> absolutely not scalable (if it works the way I think it does --
> haven't looked at the code).

No typically you'll only be passing the xml for a single "row" so what
we end up doing in Postgres typically looks something like this:

SELECT xmlvalue('/row/@id', bitesizexml)::int AS id,
xmlvalue('/row/@lat', bitesizexml)::numeric AS lat,
xmlvalue('/row/@lon', bitesizexml)::numeric,
xmlvalue('/row/comment', bitesizexml) AS cmt
FROM (
SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml
) sub

So only the one call has to work with the entire document. All the calls
to xmlvalue are passed a much smaller node to work with.

> xpath is great for simple things but it's too texty and you need a
> more robust API to handle documents for serious parsing on the
> backend. In the short term i'd advise doing work in another pl like
> perl.

This is basically the method used for Oracle too until they provided
XMLTable functionality. They had a function xmlsequence that basically
did the unnest(xpath()) part. Hopefully we'll get xmltable support soon.

Scott

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2010-01-05 19:14:03 true serializability and predicate locking
Previous Message Markus Wanner 2010-01-05 19:06:37 Re: Testing with concurrent sessions