Re: Another issue with invalid XML values

From: Florian Pflug <fgp(at)phlo(dot)org>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Another issue with invalid XML values
Date: 2011-06-01 16:16:21
Message-ID: 6D5AD292-69E3-478E-B41A-0B2728CAB0CB@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun1, 2011, at 03:17 , Florian Pflug wrote:
> My nagging suspicion is that libxml reports errors like there via some callback function, and only returns a non-zero result if there are structural errors in the XML. But my experience with libxml is pretty limited, so maybe someone with more experience in this area can shed some light on this...

As it turns out, this is actually the case.

libxml reports some errors (like invalid xmlns attributes) via the error handler set using xmlSetGenericErrorFunc() but still returns zero (indicating success) from xmlCtxtReadDoc() and xmlParseBalancedChunkMemory().

If I modify xml_parse() to complain not only if one of these functions return non-zero, but also if xml_err_buf has non-zero length, invalid xmlns attributes are reported correctly.

However, the error function set using xmlSetGenericErrorFunc() cannot distinguish between error and warnings, so doing this causes XMLPARSE() to also complain about things like non-absolute namespace URIs (which are allowed but deprecated as far as I understand).

To fix that, xmlSetGenericErrorFunc() would probably have to be replace by xmlSetStructuredErrorFunc(). Structured error functions receive a pointer to an xmlError structore which, amongst other things, contains an xmlErrorLevel (NONE, WARNING, ERROR, FATAL).

While digging through the code in src/backend/utils/adt/xml.c, I also noticed that we set a global error handler instead of a per-context one. I guess this is because xmlParseBalancedChunkMemory(), which we use to parse XML fragments, doesn't provide a way to pass in a context but rather creates it itself. Still, I wonder if there isn't some other API which we could use which does allow us to specify a context. Again, it'd be nice if someone more familiar with this code could explain the reasons behind the current design.

Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2011-06-01 16:24:25 Re: PQdeleteTuple function in libpq
Previous Message Jeff Janes 2011-06-01 15:58:24 Re: patch for new feature: Buffer Cache Hibernation