Re: machine-readable explain output v4

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: machine-readable explain output v4
Date: 2009-08-09 22:19:37
Message-ID: 603c8f070908091519g3f54248bk93e82773ad900432@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 9, 2009 at 3:57 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Revised patch attached.  I'm not convinced this is as good as it can
>> be, but I've been looking at this patch for so long that I'm starting
>> to get cross-eyed, and I'd like to Tom at least have a look at this
>> and assess it before we run out of CommitFest.
>
> I'm starting to look at this now.  I feel unqualified to opine on the
> quality of the XML/JSON schema design, and given the utter lack of
> documentation of what that design is, I'm not sure that anyone who
> could comment on it has done so.  Could we have a spec please?

*scratches head*

You're not the first person to make that request, and I'd like to
respond to it to well, but I don't really know what to write. Most of
the discussion about the XML/JSON output format thus far has been
around things like whether we should downcase everything, and even the
people offering these comments have mostly labelled them with words to
the effect of "I know this is trival but...". I think that the reason
for this is that fundamentally explain output is fundamentally a tree,
and XML and JSON both have ways of representing a tree with properties
hanging off the nodes, and this patch uses those ways. I can't figure
out what else there is, so I don't know what I'm explaining why I
didn't do.

The one significant representational choice that I'm aware of having
made is to use nested tags rather than attributes in the XML format.
This seems to me to offer several advantages. First, it's clearly
impossible to standardize on attributes, because attributes can only
be text, and it seems to me that if we're going to try to output
structured data, we want to take that as far as we can, and we have
attributes (like sort keys) that are lists rather than scalars. Using
tags means that they can have substructure when needed. Second, it
seems likely to me that people will want to extend explain further in
the future: indeed, that was the whole point of the explain-options
patch which was already committed. That's pretty simple in the
current design - just add a few more calls to ExplainPropertyText or
ExplainPropertyList in the appropriate place, and you're done. I'm
pretty sure that splitting things up between attributes and nested
tags would complicate such modifications.

Peter Eisentraut, in an earlier review of this patch, complained about
the format as well, saying something along the lines of "this is
trying to be all things to all people". I don't want to dismiss that
criticism, but neither can I put my finger on the problem. In an
ideal world, we'd like to be all things to all people, but it's
usually not possible to achieve that in practice. Still, it's not
clear to me what need this wouldn't serve. It's possible to generate
the text format from the XML or JSON format, so it should be
well-suited to graphical presentation of explain output. It's also
possible to grope through the output and, say, find the average cost
of all your seqscan nodes, or verify the absence of merge joins, or
anything of that sort that someone might think that they want to do.

In a nutshell, the design is "take all the fields we have now and put
XML/JSON markup around them so they're easier to get to". Maybe
that's not enough of a design, but I don't have any other ideas.

> Also, the JSON code seems a bit messy/poorly factorized.  Is there
> a reason for that, or just it's not as mature as the XML code?

I wrote them together, so it's not a question of code maturity, but I
wouldn't rule out me being dumb. I'm open to suggestions... AFAICS,
the need to comma-separate list and hash elements is most of the
problem. I had thought about replacing es->needs_separator with a
list so that we could push/pop elements, but I wasn't totally sure
whether that was a good idea.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2009-08-09 22:47:59 pgsql: Ship documentation without intermediate tarballs Documentation
Previous Message Bruce Momjian 2009-08-09 21:53:14 Re: hot standby - merged up to CVS HEAD