Re: Native XML

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Anton <antonin(dot)houska(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Native XML
Date: 2011-03-01 13:43:58
Message-ID: 4D6CF81E.8020100@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/01/2011 08:16 AM, Robert Haas wrote:
> On Mon, Feb 28, 2011 at 6:54 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> There seems to be an almost universal assumption that storing XML in its
>> native form (i.e. a text stream) is going to produce inefficient results.
>> Maybe it will, but I think it needs to be fairly convincingly demonstrated.
>> And then we would have to consider the costs. For example, unless we
>> implemented our own XPath processor to work with our own XML format (do we
>> really want to do that?), to evaluate an XPath expression for a piece of XML
>> we'd actually need to produce the text format from our internal format
>> before passing it to some external library to parse into its internal format
>> and then process the XPath expression. That means we'd actually be making
>> things worse, not better. But this is clearly the sort of processing people
>> want to do - see today's discussion upthread about xpath_table.
> Well, obviously the only point of having our own internal format is if
> we have our own xpath processor&c to match. One would think that
> this would be a lot faster than parsing the string with libxml2 every
> time we want to xpath it, especially for large documents. But then
> again, I haven't seen any benchmarks.

That would be a huge body of code we'd need to maintain, complex and
full of subtleties which, if we weren't deeply invested in the XML
standards would bite us, I have no doubt.

Now, if someone wanted to start a project that added efficient
serialization/de-serialization of libxml2 (or other library) objects so
we could avoid constant parsing overhead, that would make lots more
sense to me.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-03-01 14:17:23 Re: PG signal handler and non-reentrant malloc/free calls
Previous Message Robert Haas 2011-03-01 13:40:37 Re: [HACKERS] Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum