Re: Searching for the cause of a bad plan

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Searching for the cause of a bad plan
Date: 2007-09-24 13:27:09
Message-ID: 1190640429.4181.165.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, 2007-09-21 at 19:30 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > That's not my perspective. If the LIMIT had been applied accurately to
> > the cost then the hashjoin would never even have been close to the
> > nested join in the first place.
>
> [ shrug... ] Your perspective is mistaken. There is nothing wrong with
> the way the LIMIT estimation is being done. The plan in question was
>
> Limit (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 rows=2 loops=1)
> -> Nested Loop (cost=0.00..1794642.48 rows=36 width=1804) (actual time=17.729..21.671 rows=2 loops=1)
> -> Index Scan using pk_table_a on table_a ta (cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1)
> Index Cond: (a = $1)
> -> Index Scan using pk_table_b2 on table_b2 tb (cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2)
> Index Cond: (ta.b = tb.b)
> Total runtime: 21.876 ms
>
> and there are two fairly serious estimation errors here, neither related
> at all to the LIMIT:
>
> * five-orders-of-magnitude overestimate of the number of table_a rows
> that will match the condition on a;

I don't see any problem with this estimate, but I do now agree there is
a problem with the other estimate.

We check to see if the value is an MFV, else we assume that the
distribution is uniformly distributed across the histogram bucket.

Csaba provided details of the fairly shallow distribution of values of a
in table_a. 96% of rows aren't covered by the MFVs, so its a much
shallower distribution than is typical, but still easily possible. So
based upon what we know there should be ~330,000 rows with the value of
a used for the EXPLAIN.

So it looks to me like we did the best we could with the available
information, so I can't see that as a planner problem per se. We cannot
do better a priori without risking worse plans in other circumstances.

> * enormous underestimate of the number of join rows --- it's apparently
> thinking only 0.0001 of the table_a rows will have a join partner,
> whereas at least for this case they all do.

OK, I agree this estimate does have a problem and it has nothing to do
with LIMIT.

Looking at the code I can't see how this selectivity can have been
calculated. AFAICS eqjoinsel() gives a selectivity of 1.0 using the data
supplied by Csaba and it ought to cover this case reasonably well.

Csaba, please can you copy that data into fresh tables, re-ANALYZE and
then re-post the EXPLAINs, with stats data.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message brauagustin-susc 2007-09-24 13:59:26 Re: Low CPU Usage
Previous Message Ow Mun Heng 2007-09-24 06:58:36 [OT] Re: [Again] Postgres performance problem