Re: Another issue with invalid XML values

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Another issue with invalid XML values
Date: 2011-07-20 10:37:48
Message-ID: 75591DCB-C4A2-447C-8F0E-A997024106F8@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul20, 2011, at 01:42 , Tom Lane wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> Updated patch attached. Do you think this is "Ready for Committer"?
>
> I've been looking through this patch. While it's mostly good, I'm
> pretty unhappy with the way that the pg_xml_init/pg_xml_done code is
> deliberately designed to be non-reentrant (ie, throw an Assert if
> pg_xml_init is called twice without pg_xml_done in between).
> There are at least two issues with that:

Hm, yeah, I did it that way because I didn't see an immediate need
to support nested pg_xml_init/done calls. But you're right - since
we're already nearly there, we might as well go all the way.

> 1. If you forget pg_xml_done in some code path, you'll find out from
> an Assert at the next pg_xml_init, which is probably far away from where
> the actual problem is.

Very true. In fact, I did miss one pg_xml_done() call in the xml2
contrib module initially, and it took me a while to locate the place
it was missing.

But won't me miss that error entirely if me make it re-entrant?

> 2. I don't think it's entirely unlikely that uses of libxml could be
> nested.
>
> xpath_table in particular calls an awful lot of stuff between
> pg_xml_init and pg_xml_done, and is at the very least open to loss of
> control via an elog before it's called pg_xml_done.

True. But note that this function's error handling leaves something
to be desired anyway - I think it'll leak memory if it elog()s, for
example.

I was tempted to make all the functions in xml2/ use TRY/CATCH blocks
like the ones in core's xml.c do. The reasons I held back was I that
I felt these cleanups weren't part of the problem my patch was trying
to solve. At least according to our docs the xml2 contrib module is
also deprecated, which was another reason to make only the absolutely
necessary adjustments.

> I think this patch has already paid 90% of the notational price for
> supporting fully re-entrant use of libxml. What I'm imagining is
> that we move all five of the static variables (xml_strictness,
> xml_err_occurred, xml_err_buf, xml_structuredErrorFunc_saved,
> xml_structuredErrorContext_saved) into a struct that's palloc'd
> by pg_xml_init and eventually passed to pg_xml_done. It could be
> passed to xml_errorHandler via the currently-unused context argument.
> A nice side benefit is that we could get rid of PG_XML_STRICTNESS_NONE.

I'd marginally prefer for the caller, not pg_xml_init(), to be responsible
for allocating the struct. That gives the caller the option to simply
allocate it on the stack.

> Now the risk factor if we do that is that if someone misses a
> pg_xml_done call, we leave an error handler installed with a context
> argument that's probably pointing at garbage, and if someone then tries
> to use libxml without re-establishing their error handler, they've
> got problems. But they'd have problems anyway with the current form of
> the patch.

Currently it'd actually work in non-assert-enabled builds, so long
as no third-party library depends on its libxml error handler being
restored by us (which the old code simply assume never to be the case).

> We could provide some defense against this by including a
> magic identifier value in the palloc'd struct and making
> xml_errorHandler check it before doing anything dangerous. Also, we
> could make pg_xml_done warn if libxml's current context pointer is
> different from the struct passed to it, which would provide another
> means of detection that somebody had missed a cleanup call.

There's one additional hazard. Suppose you have a functions f() and
g() defined as

g() {
PgXmlState* xml_state = pg_xml_init();
...
/* Ups. No pg_xml_done() calll here */
}

f() {
PgXmlState* xml_state = pg_xml_init();

g();

xmlSomeLibXmlFunctionCall();
XML_CHECK_AND_EREPORT(xml_state, ...);

pg_xml_done();
}

Error reported by xmlSomeLibXmlFunctionCall() will now be
missed by the XML_CHECK_AND_EREPORT in f(), because they'll
modify g()'s xml_state, not f()'s.

So we really ought to make pg_xml_done() complain if libxml's
current error context isn't what it expects.

> Unless someone sees a major hole in this idea, or a better way to do it,
> I'm going to modify the patch along those lines and commit.

If pg_xml_done() and the error handler do the safety check you suggested,
it seems fine.

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-07-20 10:49:09 Re: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Previous Message Mark Kirkwood 2011-07-20 09:21:48 Re: Re: patch review : Add ability to constrain backend temporary file space