Re: Another issue with invalid XML values

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Another issue with invalid XML values
Date: 2011-06-24 14:15:35
Message-ID: 3B520CA8-B864-4A0A-A840-D39068AA94A5@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun24, 2011, at 13:24 , Noah Misch wrote:
> On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
>> Updated patch attached.
>
> This builds and passes all tests on both test systems I used previously
> (CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
> 2.6.31.dfsg-2ubuntu1.6). Tested behaviors look good, too.
>
>> On Jun20, 2011, at 22:45 , Noah Misch wrote:
>>> On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
>>>> On Jun20, 2011, at 19:57 , Noah Misch wrote:
>>> Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
>>> available and xmlGenericErrorContext otherwise, I would just add an Autoconf
>>> check to identify which one to use.
>>
>> It seems that before xmlStructuredErrorContext was introduced, the structured
>> and the generic error handler shared an error context, so doing what you
>> suggested looks sensible.
>>
>> I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
>> this the right way. I basically copied a similar check (for setlongjump AFAIR)
>> and adapted it to check for xmlStructuredErrorContext instead.
>
> Turned out fine. Note that, more often than not, committers ask for patches
> not to contain the diff of the generated "configure" itself. (I personally
> don't mind it when the diff portion is small and generated with the correct
> version of Autoconf, as in this patch.)

Ah, OK. I didn't know what project policy was there, so I figured I'd include
it since the configure script is tracked by git too. Now that I know, I'll remove
it from the final patch.

>>>> I believe that the compatibility risk is pretty low here, and removing
>>>> the meaningless prefix makes the error message are bit nicer to read.
>>>> But if you are concerned that there maybe applications out there who
>>>> parse the error text, we could of course add it back. I must say that
>>>> I don't really know what our policy regarding error message stability is...
>>>
>>> If the libxml2 API makes it a major pain to preserve exact messages, I
>>> wouldn't worry.
>>
>> It'd only require us to prepend "Entity: " to the message string, so
>> there's no pain there. The question is rather whether we want to continue
>> having a pure noise word in front of every libxml error message for
>> the sake of compatibility.
>
> There's also the " parser error :" difference; would that also be easy to
> re-add? (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

It's the string representation of the error domain and error level. Unfortunately,
the logic which builds that string representation is contained in the static
function xmlReportError() deep within libxml, and that function doesn't seem
to be called at all for errors reported via a structured error handler :-(

So re-adding that would mean duplicating that code within our xml.c, which
seems undesirable...

>> I vote "Nay", on the grounds that I estimate the actual breakage from
>> such a change to be approximately zero. Plus the fact that libxml
>> error messages aren't completely stable either. I had to suppress the
>> DETAIL output for one of the regression tests to make them work for
>> both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
>> URI in it's error message in one of these versions but not the other...
>
> I'm not particularly worried about actual breakage; I'm just putting on the
> "one change per patch"-lawyer hat. All other things being equal, I'd prefer
> one patch that changes the mechanism for marshalling errors while minimally
> changing the format of existing errors, then another patch that proposes new
> formatting. (Granted, the formatting-only patch would probably founder.) But
> it's a judgement call, and this change is in a grey area. I'm fine with
> leaving it up to the committer.

I agree with that principle, in principle. In the light of my paragraph above
about how we'd need to duplicate libxml code to keep the error messages the same,
however, I'll leave it up to the committer as you suggested.

> A few minor code comments:
>
>> *** a/src/backend/utils/adt/xml.c
>> --- b/src/backend/utils/adt/xml.c
>
>> + static bool xml_error_initialized = false;
>
> Since xml_error_initialized is only used for asserts and now redundant with
> asserts about xml_strictness == PG_XML_STRICTNESS_NONE, consider removing it.

You're absolutely right. Will remove.

>> ! /*
>> ! * pg_xml_done --- restore libxml state after pg_xml_init().
>> ! *
>> ! * This must be called if pg_xml_init() was called with a
>> ! * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
>> ! * (i.e. the structured error handler) to the original state.
>
> The first sentence is obsolete.

Right again. Will remove.

>> *** a/src/include/utils/xml.h
>> --- b/src/include/utils/xml.h
>> *************** typedef enum
>> *** 68,74 ****
>> XML_STANDALONE_OMITTED
>> } XmlStandaloneType;
>>
>> ! extern void pg_xml_init(void);
>> extern void xml_ereport(int level, int sqlcode, const char *msg);
>> extern xmltype *xmlconcat(List *args);
>> extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
>> --- 68,87 ----
>> XML_STANDALONE_OMITTED
>> } XmlStandaloneType;
>>
>> ! typedef enum
>> ! {
>> ! PG_XML_STRICTNESS_NONE /* No error handling */,
>> ! PG_XML_STRICTNESS_LEGACY /* Ignore notices/warnings/errors unless
>> ! * function result indicates error condition
>> ! */,
>> ! PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser notices/warnings/errors */,
>> ! PG_XML_STRICTNESS_ALL /* Report all notices/warnings/errors */,
>> ! } PgXmlStrictnessType;
>
> We don't generally prefix backend names with Pg/PG. Also, I think the word
> "Type" in "PgXmlStrictnessType" is redundant. How about just XmlStrictness?

I agree with removing the "Type" suffix. I'm not so sure about the "Pg"
prefix, though. I've added that after actually hitting a conflict between
one of this enum's constants and some constant defined by libxml. Admittedly,
that was *before* I renamed the type to "PgXmlStrictnessType", so removing
the "Pg" prefix now would probably work. But it seems a bit close for
comfort - libxml defines a ton of constants named XML_something.

Still, if you feel that the "Pg" prefix looks to weird and believe that it's
better to wait until there's an actual clash before renaming things, I won't
object further. Just wanted to bring the issue to your attention...

I'll post an updated patch once that last issue is resolved.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2011-06-24 14:37:24 Re: silent_mode and LINUX_OOM_ADJ
Previous Message Robert Haas 2011-06-24 13:13:44 Re: pg_upgrade defaulting to port 25432