Re: possible encoding issues with libxml2 functions

From: Noah Misch <noah(at)leadboat(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible encoding issues with libxml2 functions
Date: 2017-03-17 03:23:27
Message-ID: 20170317032327.GA1993326@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 12, 2017 at 10:26:33PM +0100, Pavel Stehule wrote:
> 2017-03-12 21:57 GMT+01:00 Noah Misch <noah(at)leadboat(dot)com>:
> > On Sun, Mar 12, 2017 at 08:36:58PM +0100, Pavel Stehule wrote:
> > > 2017-03-12 0:56 GMT+01:00 Noah Misch <noah(at)leadboat(dot)com>:
> > Please add a test case.
>
> It needs a application - currently there is not possibility to import XML
> document via recv API :(

I think xml_in() can create every value that xml_recv() can create; xml_recv()
is just more convenient given diverse source encodings. If you make your
application store the value into a table, does "pg_dump --inserts" emit code
that reproduces the same value? If so, you can use that in your test case.
If not, please provide precise instructions (code, SQL commands) for
reproducing the bug manually.

> > Why not use xml_parse() instead of calling xmlCtxtReadMemory() directly?
> > The
> > answer is probably in the archives, because someone understood the problem
> > enough to document "Some XML-related functions may not work at all on
> > non-ASCII data when the server encoding is not UTF-8. This is known to be
> > an
> > issue for xpath() in particular."
>
>
> Probably there are two possible issues

Would you research in the archives to confirm?

> 1. what I touched - recv function does encoding to database encoding - but
> document encoding is not updated.

Using xml_parse() would fix that, right?

> 2. there are not possibility to encode from document encoding to database
> encoding.

Both xml_in() and xml_recv() require the value to be representable in the
database encoding, so I don't think this particular problem can remain by the
time we reach an xpath_internal() call.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-03-17 03:24:07 Re: logical replication launcher crash on buildfarm
Previous Message Peter Eisentraut 2017-03-17 03:13:06 Re: [COMMITTERS] pgsql: Remove objname/objargs split for referring to objects