possible encoding issues with libxml2 functions

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: possible encoding issues with libxml2 functions
Date: 2017-02-20 18:48:18
Message-ID: CAFj8pRC-dM=tT=QkGi+Achkm+gwPmjyOayGuUfXVumCxkDgYWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

Today I played with xml_recv function and with xml processing functions.

xml_recv function ensures correct encoding from document encoding to server
encoding. But the decl section holds original encoding info - that should
be obsolete after encoding. Sometimes we solve this issue by removing decl
section - see the xml_out function.

Sometimes we don't do it - lot of functions uses direct conversion from
xmltype to xmlChar. Wrong encoding in decl section can breaks libxml2
parser with error

ERROR: could not parse XML document
DETAIL: input conversion failed due to input error, bytes 0x88 0x3C 0x2F
0x72
line 1: switching encoding: encoder error

This error is not often - but it is hard to find it - because there is
small but important difference between printed XML and used XML.

There are possible two fixes

a) clean decl on input - the encoding info can be removed from decl part

b) use xml_out_internal everywhere before transformation to
xmlChar. pg_xmlCharStrndup can be good candidate.

Comments?

Regards

Pavel

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-02-20 19:02:45 Re: fd,c just Assert()s that lseek() succeeds
Previous Message Fujii Masao 2017-02-20 18:42:01 Re: DROP SUBSCRIPTION and ROLLBACK