Re: Native XML

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Anton <antonin(dot)houska(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Native XML
Date: 2011-03-09 19:21:03
Message-ID: 4D77D31F.9060501@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2011-03-09 19:30, Robert Haas wrote:
> On Wed, Mar 9, 2011 at 1:11 PM, Bruce Momjian<bruce(at)momjian(dot)us> wrote:
>> Robert Haas wrote:
>>> On Mon, Feb 28, 2011 at 10:30 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Well, in principle we could allow them to work on both, just the same
>>>> way that (for instance) "+" is a standardized operator but works on more
>>>> than one datatype. ?But I agree that the prospect of two parallel types
>>>> with essentially duplicate functionality isn't pleasing at all.
>>> The real issue here is whether we want to store XML as text (as we do
>>> now) or as some predigested form which would make "output the whole
>>> thing" slower but speed up things like xpath lookups. We had the same
>>> issue with JSON, and due to the uncertainty about which way to go with
>>> it we ended up integrating nothing into core at all. It's really not
>>> clear that there is one way of doing this that is right for all use
>>> cases. If you are storing xml in an xml column just to get it
>>> validated, and doing no processing in the DB, then you'd probably
>>> prefer our current representation. If you want to build functional
>>> indexes on xpath expressions, and then run queries that extract data
>>> using other xpath expressions, you would probably prefer the other
>>> representation.
>> Someone should measure how much overhead the indexing of xml values
>> might have. If it is minor, we might be OK with only an indexed xml
>> type.
> I think the relevant thing to measure would be how fast the
> predigested representation speeds up the evaluation of xpath
> expressions.
About a predigested representation, I hope I'm not insulting anyone's
education here, but a lot of XML database 'accellerators' seem to be
using the pre and post orders (see
http://en.wikipedia.org/wiki/Tree_traversal) of the document nodes. The
following two pdfs show how these orders can be used to query for e.g.
all ancestors of a node: second pdf slide 10: for nodes x,y : x is an
ancestor of y when x.pre < y.pre AND x.post > y.post.

www.cse.unsw.edu.au/~cs4317/09s1/tutorials/tutor4.pdf about the format
www.cse.unsw.edu.au/~cs4317/09s1/tutorials/tutor10.pdf about querying
the format

regards,
Yeb Havinga

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-03-09 19:25:34 Re: Problem with pg_upgrade (8.4 -> 9.0) due to ALTER DATABASE SET ROLE
Previous Message Robert Haas 2011-03-09 19:14:00 Re: wrap alpha4 tomorrow ~9am Eastern (was: Alpha4 release blockers)