Re: Query planner cost estimate less than the sum of its parts?

From: "Scott Carey" <scott(at)richrelevance(dot)com>
To: "Gregory Stark" <stark(at)enterprisedb(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Query planner cost estimate less than the sum of its parts?
Date: 2008-11-05 22:19:32
Message-ID: a1ec7d000811051419m294b967akb6a7caf7aac5ead1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'll have to think a bit about that given that the query had run for 20
hours of 250MB/sec-ish disk reads and wasn't done. Luckily, thats not even
35% disk utilization on this system, and the 'right' query with fewer tables
does things properly with a hash and takes seconds rather than hours
(days?).

If it can short-circuit the search, then its probably extremely
underestimating how much data it has to look through before finding a match,
which I'd expect out of a partitioned table query since the planner
assumptions around those are generally bad to really bad (as in, the
aggregate statistics on a list of tables is essentially not used or
calculated/estimated wrong). I suppose the real problem is there, its going
to have to look through most of this data to find a match, on every loop,
and the planner has no clue.
If the nested loop was the other way around it would not have even pinned
the disk and have been all in memory on the matching. If it had hashed all
of the estimated 128K values in the top -- which at 1GB for work_mem it
should but does not -- it could have scanned once for matches and thrown out
those in the hash that did not have a match.

Anyhow this isn't causing a problem at the moment, and it looks like the
usual culprit with poor planner choices on partition tables and not a new
one.

On Wed, Nov 5, 2008 at 1:22 PM, Gregory Stark <stark(at)enterprisedb(dot)com>wrote:

>
> "Scott Carey" <scott(at)richrelevance(dot)com> writes:
>
> > Certainly, a cost estimate that is ... LESS than one of the sub sections
> of
> > the query is wrong. This was one hell of a broken query, but it at
> least
> > should have taken an approach that was not a nested loop, and I'm curious
> if
> > that choice was due to a bad estimate here.
> >
> > Nested Loop IN Join (cost=0.00..3850831.86 rows=128266 width=8)
>
> Because it's an IN join it doesn't have to run the inner join to
> completion.
> Once it finds a match it can return the outer tuple and continue to the
> next
> outer tuple.
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com
> Ask me about EnterpriseDB's 24x7 Postgres support!
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2008-11-05 23:16:11 Re: Create and drop temp table in 8.3.4
Previous Message Gregory Stark 2008-11-05 21:22:09 Re: Query planner cost estimate less than the sum of its parts?