Re: how to identify expensive steps in an explain analyze output

From: "Frits Hoogland" <frits(dot)hoogland(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: how to identify expensive steps in an explain analyze output
Date: 2008-02-27 17:36:52
Message-ID: fbb8fbcd0802270936p4e0b6cabn9738fa6e89e69e82@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

thanks for your answer!

okay, cost is a totally wrong word
here. I am using the "actual times" in the execplan.

what I am trying to do, is optimise a database both by investigating
execplans, and thinking about what concurrency would do to my database. (I
have a database which is reported to almost stop functioning under load)

I've read the sections you pointed out. It's quite understandable.

What I am missing, is the connection between the logical
steps (ie. the execplan) and the physical implications.
For example: the time it took for a seqscan to complete can be
computed by subtracting the two times after "actual time"
(because it's an end node in the execplan, is that assumption
(scantime=totaltime-startuptime) right?)
I can compute how long it approximately took for each row by dividing the
time through the number of rows (and loops)
but I do not know how many physical IO's it has done, and/or how many
logical IO's
Same for merge joins&sorts: the physical implication (writing and
reading if the amount of data exceeds work_mem) is not in the execplan.
that's the reason I mentioned "cost".
I know understand that it's impossible to judge the "cost" of a merge join,
because it's time is composited by both the scans and the merge operation
itself, right?

Is there any way to identify nodes in the execplan which "cost" many (CPU
time, IO, etc.)?

regards

frits

On 2/27/08, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Frits Hoogland" <frits(dot)hoogland(at)gmail(dot)com> writes:
> > The manual states: "Actually two numbers are shown: the start-up time
> before
> > the first row can be returned, and the total time to return all the
> rows.".
> > Does this mean that the difference between the first and second is the
> cost
> > or the time the step in the explain has taken?
>
>
> No, or at least only for a very strange definition of "cost". An
> example of the way these are used is that for a hash join, the startup
> time would include the time needed to scan the inner relation and build
> the hash table from it. The run time (ie, difference between startup
> and total) represents the part of the process where we're scanning the
> outer relation and probing into the hash table for matches. Rows are
> returned as matches are found during this part of the process. I can't
> think of any useful definition under which the startup time would be
> ignored.
>
> The reason the planner divides the total cost like this is that in the
> presence of LIMIT or a few other SQL features, it may not be necessary
> to run the plan to completion, but only to fetch the first few rows.
> In this case a plan with low startup cost may be preferred, even though
> the estimated total cost to run it to completion might be higher than
> some other plan has. We're not *going* to run it to completion, and
> so the really interesting figure is startup cost plus some appropriate
> fraction of run cost. You can see this at work if you look at the
> EXPLAIN numbers for a query involving a LIMIT.
>
> The whole thing might make a bit more sense if you read
> http://www.postgresql.org/docs/8.3/static/overview.html
> particularly the last two subsections.
>
> regards, tom lane
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Laurent Raufaste 2008-02-27 17:38:48 Re: PG planning randomly ?
Previous Message Douglas J Hunley 2008-02-27 16:01:57 questions about CLUSTER