Re: possible encoding issues with libxml2 functions

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible encoding issues with libxml2 functions
Date: 2017-03-12 19:36:58
Message-ID: CAFj8pRADS6yUVwziAS_v2c_oPEse6RZgk2WnSa8o3i-GwTKmaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2017-03-12 0:56 GMT+01:00 Noah Misch <noah(at)leadboat(dot)com>:

> On Mon, Feb 20, 2017 at 07:48:18PM +0100, Pavel Stehule wrote:
> > Today I played with xml_recv function and with xml processing functions.
> >
> > xml_recv function ensures correct encoding from document encoding to
> server
> > encoding. But the decl section holds original encoding info - that should
> > be obsolete after encoding. Sometimes we solve this issue by removing
> decl
> > section - see the xml_out function.
> >
> > Sometimes we don't do it - lot of functions uses direct conversion from
> > xmltype to xmlChar.
>
> > There are possible two fixes
> >
> > a) clean decl on input - the encoding info can be removed from decl part
> >
> > b) use xml_out_internal everywhere before transformation to
> > xmlChar. pg_xmlCharStrndup can be good candidate.
>
> I'd prefer (a) if the xml type were a new feature, because no good can
> come of
> storing an encoding in each xml field when we know the actual encoding is
> the
> database encoding. However, if you implemented (a), we'd still see
> untreated
> values brought over via pg_upgrade. Therefore, I would try (b) first. I
> suspect the intent of xml_parse() was to implement (b); it will be
> interesting
> to see your test case that malfunctions.
>

I looked there again and I found so this issue is related to xpath function
only

Functions based on xml_parse are working without problems. xpath_internal
uses own direct xmlCtxtReadMemory without correct encoding sanitation.

so fix is pretty simple

diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c

index f81cf489d2..89aae48cb3 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3874,9 +3874,11 @@ xpath_internal(text *xpath_expr_text, xmltype *data,
ArrayType *namespaces,
ns_count = 0;
}

- datastr = VARDATA(data);
- len = VARSIZE(data) - VARHDRSZ;
+ datastr = xml_out_internal(data, 0);
+ len = strlen(datastr);
+
xpath_len = VARSIZE(xpath_expr_text) - VARHDRSZ;
+
if (xpath_len == 0)
ereport(ERROR,
(errcode(ERRCODE_DATA_EXCEPTION),

Regards

Pavel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-03-12 19:38:58 Re: [REVIEW] macaddr 64 bit (EUI-64) datatype support
Previous Message Tom Lane 2017-03-12 19:29:37 Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)