Re: possible encoding issues with libxml2 functions

From: Noah Misch <noah(at)leadboat(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible encoding issues with libxml2 functions
Date: 2017-08-20 02:17:34
Message-ID: 20170820021734.GA4027908@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 18, 2017 at 11:43:19PM +0200, Pavel Stehule wrote:
> yes, probably libXML2 try to do check from utf8 encoding to header
> specified encoding.

Yes. That has been the topic of this thread.

> a) all values created by xml_in iterface are in database encoding - input
> string is stored without any change. xml_parse is called only due
> validation.
>
> b) inside xml_parse, the input is converted to UTF8, and document is read
> by xmlCtxtReadDoc with explicitly specified "UTF-8" encoding or
> by xmlParseBalancedChunkMemory with explicitly specified encoding "UTF8"
> and removed decl section.
>
> So for "xml_parse" based functions (xml_in, texttoxml, xml_is_document,
> wellformated_xml) the database encoding is not important
>
> c) xml_recv function does validation by xml_parse and translation to
> database encoding.
>
> Now I don't see a difference between @b and @c - so my hypotheses about
> necessity to use recv interface was wrong.

Yes. You posted, on 2017-04-05, a test case not requiring the recv interface.

On Sat, Aug 19, 2017 at 09:13:50AM +0200, Pavel Stehule wrote:
> I didn't find any info how to enable libXML2 XPath functions for other
> encoding than UTF8 :( ??

http://xmlsoft.org/encoding.html is the relevant authority. To summarize, we
should send only UTF8 to libxml2.

On Sat, Aug 19, 2017 at 10:53:19PM +0200, Pavel Stehule wrote:
> I am sending some POC - it does support XPATH and XMLTABLE for not UTF8
> server encoding.
>
> In this case, all strings should be converted to UTF8 before call libXML2
> functions, and result should be converted back from UTF8.

Adding support for xpath in non-UTF8 databases is a v11 feature proposal.
Please start a new thread for this, and add it to the open CommitFest.

In this thread, would you provide the version of your patch that I described
in my 2017-08-08 post to this thread? That's a back-patchable bug fix.

> I found some previous experiments https://marc.info/?l=pgsql-bugs&m=123407176408688

https://wiki.postgresql.org/wiki/Todo#XML links to other background on this
feature proposal. See Tom Lane's review of a previous patch. Ensure your
patch does not have the problems he found during that review. Do that before
starting a thread for this feature.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-08-20 06:46:03 Re: possible encoding issues with libxml2 functions
Previous Message MauMau 2017-08-20 02:10:52 Re: [RFC] What would be difficult to make data models pluggable for making PostgreSQL a multi-model database?