Re: Another issue with invalid XML values

From: Noah Misch <noah(at)leadboat(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Another issue with invalid XML values
Date: 2011-06-24 11:24:33
Message-ID: 20110624112433.GE8044@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Florian,

On Wed, Jun 22, 2011 at 12:47:42PM +0200, Florian Pflug wrote:
> Updated patch attached.

This builds and passes all tests on both test systems I used previously
(CentOS 5 with libxml2-2.6.26-2.1.2.8.el5_5.1 and Ubuntu 8.04 LTS with libxml2
2.6.31.dfsg-2ubuntu1.6). Tested behaviors look good, too.

> On Jun20, 2011, at 22:45 , Noah Misch wrote:
> > On Mon, Jun 20, 2011 at 09:15:51PM +0200, Florian Pflug wrote:
> >> On Jun20, 2011, at 19:57 , Noah Misch wrote:
> > Assuming it's sufficient to save/restore xmlStructuredErrorContext when it's
> > available and xmlGenericErrorContext otherwise, I would just add an Autoconf
> > check to identify which one to use.
>
> It seems that before xmlStructuredErrorContext was introduced, the structured
> and the generic error handler shared an error context, so doing what you
> suggested looks sensible.
>
> I don't have any autoconf-fu whatsoever, though, so I'm not 100% certain I did
> this the right way. I basically copied a similar check (for setlongjump AFAIR)
> and adapted it to check for xmlStructuredErrorContext instead.

Turned out fine. Note that, more often than not, committers ask for patches
not to contain the diff of the generated "configure" itself. (I personally
don't mind it when the diff portion is small and generated with the correct
version of Autoconf, as in this patch.)

> >>>> ! XML_PURPOSE_LEGACY /* Save error message only, don't set error flag */,
> >>>
> >>> It's fine to set the flag for legacy users, considering they could just not
> >>> read it. The important distinction seems to be that we don't emit warnings or
> >>> notices in this case. Is that correct? If so, the comment should reflect
> >>> that emphasis. Then, consider updating the flag unconditionally.
> >>
> >> I took me a while to remember while I did it that way, but I think I have now.
> >>
> >> I initialled wanted to add an Assert(!xml_error_occurred) to catch any
> >> missing XML_REPORT_ERROR() calls. I seems to have forgotten to do that,
> >> though...
> >>
> >> So I guess I should either refrain from setting the flag as you suggested,
> >> or add such an Assert(). I think I very slightly prefer the latter, what
> >> do you think?
> >
> > I do like the idea of that assert. How about setting the flag anyway, but
> > making it "Assert(xml_purpose == XML_PURPOSE_LEGACY || !xml_error_occurred)"?
>
> I added the Assert, but didn't make the setting of the error flag unconditional.
>
> If I did that, XML_CHECK_AND_EREPORT() would stop working for the LEGACY
> case. Now, that case currently isn't exercised, but breaking nevertheless
> seemed wrong to me.

Okay.

> >>>> *** a/src/test/regress/expected/xml.out
> >>>> --- b/src/test/regress/expected/xml.out
> >>>> *************** INSERT INTO xmltest VALUES (3, '<wrong')
> >>>> *** 8,14 ****
> >>>> ERROR: invalid XML content
> >>>> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> >>>> ^
> >>>> ! DETAIL: Entity: line 1: parser error : Couldn't find end of Start Tag wrong line 1
> >>>> <wrong
> >>>> ^
> >>>> SELECT * FROM xmltest;
> >>>> --- 8,14 ----
> >>>> ERROR: invalid XML content
> >>>> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> >>>> ^
> >>>> ! DETAIL: line 1: Couldn't find end of Start Tag wrong line 1
> >>>
> >>> Any reason to change the error text this way?
> >>
> >> The "Entity:" prefix is added by libxml only for messages reported
> >> to the generic error handler. It *doesn't* refer to entities as defined
> >> in xml, but instead used in place of the file name if libxml
> >> doesn't have that at hand (because it's parsing from memory).
> >>
> >> So that "Entity:" prefix really doesn't have any meaning whatsoever.
> >
> > So xmlParserPrintFileContext() sends different content to the generic error
> > handler from what "natural" calls to the handler would send?
>
> xmlParserPrintFileContext() only generates the "context" part of the error
> message. In the example above, these are the two lines
> <wrong
> ^
> These lines don't contain the "Entity:" prefix - neither with the patch
> attached nor without.

Ah, my mistake.

> >> I believe that the compatibility risk is pretty low here, and removing
> >> the meaningless prefix makes the error message are bit nicer to read.
> >> But if you are concerned that there maybe applications out there who
> >> parse the error text, we could of course add it back. I must say that
> >> I don't really know what our policy regarding error message stability is...
> >
> > If the libxml2 API makes it a major pain to preserve exact messages, I
> > wouldn't worry.
>
> It'd only require us to prepend "Entity: " to the message string, so
> there's no pain there. The question is rather whether we want to continue
> having a pure noise word in front of every libxml error message for
> the sake of compatibility.

There's also the " parser error :" difference; would that also be easy to
re-add? (I'm assuming it's not a constant but depends on the xmlErrorDomain.)

> I vote "Nay", on the grounds that I estimate the actual breakage from
> such a change to be approximately zero. Plus the fact that libxml
> error messages aren't completely stable either. I had to suppress the
> DETAIL output for one of the regression tests to make them work for
> both 2.6.23 and 2.7.8; libxml chooses to quote an invalid namespace
> URI in it's error message in one of these versions but not the other...

I'm not particularly worried about actual breakage; I'm just putting on the
"one change per patch"-lawyer hat. All other things being equal, I'd prefer
one patch that changes the mechanism for marshalling errors while minimally
changing the format of existing errors, then another patch that proposes new
formatting. (Granted, the formatting-only patch would probably founder.) But
it's a judgement call, and this change is in a grey area. I'm fine with
leaving it up to the committer.

A few minor code comments:

> *** a/src/backend/utils/adt/xml.c
> --- b/src/backend/utils/adt/xml.c

> + static bool xml_error_initialized = false;

Since xml_error_initialized is only used for asserts and now redundant with
asserts about xml_strictness == PG_XML_STRICTNESS_NONE, consider removing it.

> ! /*
> ! * pg_xml_done --- restore libxml state after pg_xml_init().
> ! *
> ! * This must be called if pg_xml_init() was called with a
> ! * purpose other than PG_XML_STRICTNESS_NONE. Resets libxml's global state
> ! * (i.e. the structured error handler) to the original state.

The first sentence is obsolete.

> *** a/src/include/utils/xml.h
> --- b/src/include/utils/xml.h
> *************** typedef enum
> *** 68,74 ****
> XML_STANDALONE_OMITTED
> } XmlStandaloneType;
>
> ! extern void pg_xml_init(void);
> extern void xml_ereport(int level, int sqlcode, const char *msg);
> extern xmltype *xmlconcat(List *args);
> extern xmltype *xmlelement(XmlExprState *xmlExpr, ExprContext *econtext);
> --- 68,87 ----
> XML_STANDALONE_OMITTED
> } XmlStandaloneType;
>
> ! typedef enum
> ! {
> ! PG_XML_STRICTNESS_NONE /* No error handling */,
> ! PG_XML_STRICTNESS_LEGACY /* Ignore notices/warnings/errors unless
> ! * function result indicates error condition
> ! */,
> ! PG_XML_STRICTNESS_WELLFORMED/* Ignore non-parser notices/warnings/errors */,
> ! PG_XML_STRICTNESS_ALL /* Report all notices/warnings/errors */,
> ! } PgXmlStrictnessType;

We don't generally prefix backend names with Pg/PG. Also, I think the word
"Type" in "PgXmlStrictnessType" is redundant. How about just XmlStrictness?

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-06-24 11:30:32 Re: Latch implementation that wakes on postmaster death on both win32 and Unix
Previous Message Heikki Linnakangas 2011-06-24 11:01:57 silent_mode and LINUX_OOM_ADJ