Re: machine-readable explain output v4

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: machine-readable explain output v4
Date: 2009-08-09 23:21:35
Message-ID: 4A7F59FF.4060302@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> The one significant representational choice that I'm aware of having
> made is to use nested tags rather than attributes in the XML format.
> This seems to me to offer several advantages. First, it's clearly
> impossible to standardize on attributes, because attributes can only
> be text, and it seems to me that if we're going to try to output
> structured data, we want to take that as far as we can, and we have
> attributes (like sort keys) that are lists rather than scalars. Using
> tags means that they can have substructure when needed. Second, it
> seems likely to me that people will want to extend explain further in
> the future: indeed, that was the whole point of the explain-options
> patch which was already committed. That's pretty simple in the
> current design - just add a few more calls to ExplainPropertyText or
> ExplainPropertyList in the appropriate place, and you're done. I'm
> pretty sure that splitting things up between attributes and nested
> tags would complicate such modifications.
>
>
>

In general, in XML one uses an attribute for a named property of an
object that can only have one value at a time. A classic example is the
dimensions of an object - it can only have one width and height.
Children (nested tags, particularly) are used for things it can have an
arbitrary number of, or things which in turn can have children. the
HTML <p> and <body> elements are (respectively) examples of these.
Generally, attribute values especially should be short - I recently saw
an example that had an entire image hex encoded in an XML attribute,
which struck me as just horrible. Enumerations, date and time values,
booleans, measurements - these are common types of attribute values.
Extracting a value from an attribute is no more or less difficult than
from a nested tag, using the XPath query language.

The XML Schema standard is a language for specifying the structure of a
given XML document type, and while it is undoubtedly complex, it is also
much more powerful than the older DTD mechanism. I think we should be
creating (and publishing) an XML Schema specification for any XML
documents we are producing. There are a number of members of the
community who are equipped to help produce these.

There is probably a good case for using an explicit namespace with such
docs. So we might have something like:

<pg:explain
xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd"> ....

BTW, has anyone tried validating the XML at all? I just looked very
briefly at the patch at
<http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and
I noticed this which makes me suspicious:

+ if (es.format == EXPLAIN_FORMAT_XML)
+ appendStringInfoString(es.str,
+ "<explain xmlns=\"http://www.postgresql.org/2009/explain\" <http://www.postgresql.org/2009/explain%5C%22>;>\n");

That ";" after the attribute is almost certainly wrong. This is a classic case of what I was talking about a month or two ago. Building up XML (or any structured doc, really, XML is not special in this regard) by ad hoc methods is horribly error prone. if you don't want to rely on libxml, then I think you need to develop a lightweight abstraction rather than just appending to a StringInfo.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2009-08-10 00:08:28 Re: machine-readable explain output v4
Previous Message Petr Jelinek 2009-08-09 23:14:27 Re: GRANT ON ALL IN schema