Re: XPATH vs. server_encoding != UTF-8

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XPATH vs. server_encoding != UTF-8
Date: 2011-07-23 23:25:05
Message-ID: 95BAA09D-E242-44D5-89F5-A2D8350A364F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul23, 2011, at 22:49 , Peter Eisentraut wrote:

> On lör, 2011-07-23 at 17:49 +0200, Florian Pflug wrote:
>> The current thread about JSON and the ensuing discussion about the
>> XML types' behaviour in non-UTF8 databases made me try out how well
>> XPATH() copes with that situation. The code, at least, looks
>> suspicious - XPATH neither verifies that the server encoding is UTF-8,
>> not does it pass the server encoding on to libxml's xpath functions.
>
> This issue is on the Todo list, and there are some archive links there.

Thanks for the pointer, but I think the discussion there doesn't
really apply here.

First, I didn't suggest (or implement) full support for XPATH() together
with server encodings other than UTF-8. My suggested patch simply
closes a hole in the implementation of the current behaviour. Instead of
relying on libxml to be able to detect that the encoding isn't UTF-8, it
relies on it only to detect that the encoding isn't ASCII. Since supported
server encodings are supersets of ASCII, the latter is trivial.

xml.c also seems to have changed quite a bite since this was last
discussed. Tom Lane argued against the proposed patch on the grounds
that there are many more places in xml.c which pass strings to libxml
without charset conversion. However, looking at it now, it seems that
all XML validation goes through xml_parse(), which actually converts
the XML to UTF-8. Only XPATH contains a separate code path, and chooses
to ignore encoding issues all together.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2011-07-24 01:50:34 Re: pgbench cpu overhead (was Re: lazy vxid locks, v1)
Previous Message Andrew Dunstan 2011-07-23 22:53:49 Re: [COMMITTERS] pgsql: Looks like we can't declare getpeereid on Windows anyway.