Re: Query progress indication - an implementation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Scara Maccai <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query progress indication - an implementation
Date: 2009-06-28 17:38:59
Message-ID: 603c8f070906281038r2dccf5c3u3562d930cbd91f24@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 26, 2009 at 3:37 AM, Scara Maccai<m_lists(at)yahoo(dot)it> wrote:
> Hi all,
>
> following the link in
>
> http://wiki.postgresql.org/wiki/Query_progress_indication
>
> but mostly:
>
> http://www.postech.ac.kr/~swhwang/progress2.pdf [1]
>
> I'm trying to write an implementation of the "dne" method in postgresql.
>
> I added another column to the pg_stat_get_activity function to report the percentage of work done for the query (of course, any other method could be used... the way the percentage is reported to the user can be easily changed).
>
> I attached a first patch (just to see if anyone is interested, the work is by no means finished).
>
> I guess I did a lot of mistakes, since I don't know anything about postgresql code...
>
> 1) the progress indicator can be eliminated at runtime; this could be done with another runtime flag (at the moment is always on)
>
> 2) I added a new structure (Progress) to PlanState to keep all the info about execution progress
>
> 3) I needed a pointer to the root of the PlanStates, to be able to calculate the total progress of the query tree (I bet this pointer was already available somewhere, but I couldn't find where...)
>
> 4) sub-plans are not included yet (well, just to be honest, I don't really know what postgresql means with those... :) )
>
> 5) the percentage is updated at most every second (can be easily changed)
>
> 6) the methods to adjust upper/lower bounds in [1] are not implemented yet (but that shouldn't be a problem)
>
> 7) the "spilled tuples" handling in [1] is not supported yet
>
> 8) only hash join, loop join, aggregate, sequence scans are implemented at the moment
>
> 9) I added another flag (EXEC_FLAG_DRIVER_BRANCH) in executor.h to signal to the sub-nodes if they are part of a branch that will contain a driver node (for example, inner subtree of a Nested Loops join is not a driver branch). I guess this could be done better at Plan level (instead of PlanState), but this way less code has to be changed
>
> 10) at the moment all driver nodes have the same "work_per_tuple=1", but this could be changed (for example, CPU-intensive driver nodes could have a smaller work_per_tuple value)
>
> Well, some (very early) tests on a tpcd db showed it works as expected (well, I only did very few tests...)

You might want to take a look at this:
http://wiki.postgresql.org/wiki/Submitting_a_Patch

The project style is not to use C++-style comments, and you should
eliminate all of the unnecessary diff hunks from your patch (like
files that have only comment or whitespace changes). Also, it is
requested that patches be submitted in context diff format and added
to the CommitFest wiki here:

http://wiki.postgresql.org/wiki/CommitFest_2009-First

As to the content of the patch, I think that what you are doing is
comparing the actual number of "operations" with the expected number
of operations. If that's correct, I'm not sure it's really all that
useful, because it will only give you accurate
percentage-of-completion information when the estimates are correct.
But when the estimates are correct, you probably have a pretty good
idea how long the query will take to run anyway. When the estimates
are off, you'll find that the actual number of operations is more than
the expected number of operations, but that won't really tell you how
far you have to go.

The only other use case I can think of for functionality of this type
is some kind of dashboard view on a system with very long-running
queries, where you want to see how far you have yet to go on each one
(maybe to approximate when you can submit the next one) without having
detailed knowledge of how expensive each individual query was project
to be. But that's a pretty narrow use case, and I'm not sure it
really justifies the overhead of instrumenting every query in this
way. For a fraction of the run-time cost, you could include the
estimated total cost of the query in the pg_stat_activity output,
which would let the user do much the same thing presuming that they
have some knowledge of the usual ratio between costs and execution
times.

Greg Stark was (is?) working on a way to get EXPLAIN-ANALYZE type
statistics on running queries; you might want to take a look at some
of that work and see what you think.

http://wiki.postgresql.org/wiki/Greg_Stark%27s_Development_Projects#EXPLAIN_queries_in_progress

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2009-06-28 18:07:30 pre-proposal: permissions made easier
Previous Message Hiroshi Saito 2009-06-28 14:48:53 Re: Problem building from source on Windows