Re: xpath processing brain dead

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xpath processing brain dead
Date: 2009-02-26 13:54:29
Message-ID: 49A69F15.9010706@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> For fear of passing an ill formed fragment of xml to the processor, we
>> strip the xml declaration if any and surround what's left with '<x>" and
>> '</x>' and prepend '/x' to the supposed xpath. This is just horrible.
>>
>
> I seem to recall having complained about that at the time, but I didn't
> (and don't) know enough about xpath to do any better.
>

Well, a few of us do. I guess I took my eye off the ball a bit back when
we were putting this into 8.3.

>
>> This whole thing is a mess, and I suspect the only fix for now is to
>> undo all the mangling of both the xml and the xpath expression.
>>
>
> I don't think we should change the behavior if it's just to arrive at
> another less-than-desirable behavior. Whacking semantics around afresh
> with each release does not endear us to users. If we know how to fix it
> right, great; but if we can't then we should keep compatibility with 8.3
> until we can.
>
>
>

Honestly, this is a bug, pure and simple. There really can't be an
argument about that. For the stable branch, we could make the following
changes that should result in a Pareto improvement (nothing gets worse
while some things get better):

* only do the xml transformation if the xml is known not to be be
well formed
* if we need to mangle the xpath expression due to doing the xml
transformation, then unless the xpath expression begins with a
'/', prepend it with '/x//'. (two slashes corresponds to the
descendent axis in xpath - in effect it stands for any number of
descendent elements).

But that's just a holding operation. For 8.4 we should stop this
nonsense and simply say that it is up to the programmer to ensure that
the xml passed to the processor is well formed.

The thing that is so very bad about this is that if the programmer *has*
made sure that the inputs are correct, s/he can still end up with broken
results. If we're going to try to fix bad inputs, we must make damn sure
that we don't break on correct inputs as a result. However, I can't off
hand think of a lightning way to fix bad inputs that doesn't carry some
danger to good inputs. Until someone comes up with something tolerably
bulletproof, I suggest that we simply say that it is the programmer's
responsibility.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua Tolley 2009-02-26 14:42:17 Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets
Previous Message Robert Haas 2009-02-26 13:22:52 Re: Proposed Patch to Improve Performance of Multi-BatchHash Join for Skewed Data Sets