Re: possible encoding issues with libxml2 functions

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: possible encoding issues with libxml2 functions
Date: 2017-10-17 04:06:40
Message-ID: CAFj8pRCc3shZ+bbHo3QDX-z=i_sBvaiTLqmYit+RGdQUqcyOOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2017-10-17 1:57 GMT+02:00 Noah Misch <noah(at)leadboat(dot)com>:

> On Sun, Aug 20, 2017 at 10:37:10PM +0200, Pavel Stehule wrote:
> > > We have xpath-bugfix.patch and xpath-parsing-error-fix.patch. Both are
> > > equivalent under supported use cases (xpath in UTF8 databases). Among
> > > non-supported use cases, they each make different things better and
> > > different
> > > things worse. We should prefer to back-patch the version harming fewer
> > > applications. I expect non-ASCII data is more common than xml
> declarations
> > > with "encoding" attribute, so xpath-bugfix.patch will harm fewer
> > > applications.
> > >
> > > Having said that, I now see a third option. Condition this thread's
> > > patch's
> > > effects on GetDatabaseEncoding()==PG_UTF8. That way, we fix supported
> > > cases,
> > > and we remain bug-compatible in unsupported cases. I think that's
> better
> > > than
> > > the other options discussed so far. If you agree, please send a patch
> > > based
> > > on xpath-bugfix.patch with the GetDatabaseEncoding()==PG_UTF8 change
> and
> > > the
> > > two edits I described earlier.
> > >
> >
> > I am sorry - too long day today. Do you think some like
> >
> > diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
> > index 24229c2dff..9fd6f3509f 100644
> > --- a/src/backend/utils/adt/xml.c
> > +++ b/src/backend/utils/adt/xml.c
> > @@ -3914,7 +3914,14 @@ xpath_internal(text *xpath_expr_text, xmltype
> *data,
> > ArrayType *namespaces,
> > if (ctxt == NULL || xmlerrcxt->err_occurred)
> > xml_ereport(xmlerrcxt, ERROR, ERRCODE_OUT_OF_MEMORY,
> > "could not allocate parser context");
> > - doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL, NULL,
> 0);
> > +
> > + /*
> > + * Passed XML is always in server encoding. When server encoding
> > + * is UTF8, we can pass this information to libxml2 to ignore
> > + * possible invalid encoding declaration in XML document.
> > + */
> > + doc = xmlCtxtReadMemory(ctxt, (char *) string, len, NULL,
> > + GetDatabaseEncoding() == PG_UTF8 ? "UTF-8" : NULL, 0);
> > if (doc == NULL || xmlerrcxt->err_occurred)
> > xml_ereport(xmlerrcxt, ERROR, ERRCODE_INVALID_XML_DOCUMENT,
> > "could not parse XML document");
>
> No, that doesn't match my description above. I don't see a way to clarify
> the
> description. Feel free to try again. Alternately, if you wait, I will
> eventually construct the patch I described.
>

Please, if you can, try it write. I am little bit lost :)

Regards

Pavel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-10-17 04:41:06 Re: Block level parallel vacuum WIP
Previous Message Michael Paquier 2017-10-17 03:34:17 Re: [PATCH] Add recovery_min_apply_delay_reconnect recovery option