Re: order of nested loop

From: Joseph Shraibman <jks(at)selectacast(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgreSQL(dot)org
Subject: Re: order of nested loop
Date: 2003-06-17 18:30:41
Message-ID: 3EEF5E51.4030306@selectacast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> Joseph Shraibman <jks(at)selectacast(dot)net> writes:
>
>>The planner shows this for the scan on d:
>>(cost=0.00..2380577.42 rows=525568 width=49)
>>Maybe it thinks it will reach the limit of 25 before it actually does,
>>which is why it is willing to try something so expensive?
>
>
> Yeah, exactly, it's extrapolating that it will only actually have to
> process a relatively small part of that scan. Which would be true if
> it were getting 4492 rows out of the join per estimate, and not just 1
> per reality. This is the same estimation failure as in the other plan,
> I think, but it's a lot simpler to see in the other plan.
>
>
>>... Thus it would make sense to first get the entries in u,
>>filter them, then filter by their status in d.
>
>
> Right, but the problem is to know how many u entries will get through
> the filter. When that estimate is off by a factor of ~5000, it's no
> surprise that the planner is likely to choose the wrong plan. If you
> could cut that estimation error by even a factor of 2, it would have
> made the right choices here.
>
> So we're back to my previous question: why is that estimate so far off?
> You might try comparing
> explain select * from usertable where podkey = 20;
> select count(*) from usertable where podkey = 20;

=> explain select * from usertable where podkey = 20;
QUERY PLAN

-----------------------------------------------------------------------------------------------
Index Scan using usertable_podkey_key on usertable
(cost=0.00..16019.99 rows=5923 width=625)
Index Cond: (podkey = 20)
(2 rows)

=> select count(*) from usertable where podkey = 20;
count
-------
3
(1 row)

=> select * from pg_stats where tablename = 'usertable' and attname
in('podkey','status','banned');
schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals |
most_common_freqs |
histogram_bounds | correlation
------------+-----------+---------+-----------+-----------+------------+------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------+-------------
public | usertable | podkey | 0 | 4 | 66
| {<actual numbers deleted, but 20 isn't on of them>} |
{0.208,0.156,0.112,0.0696667,0.0306667,0.028,0.0273333,0.025,0.0243333,0.023}
| {10,90,137,140,197,207,246,264,267,269,300} | 0.53816
public | usertable | status | 0 | 2 | 4
| {2,4,1,3} |
{0.938,0.0496667,0.0103333,0.002}
| | 0.840237
public | usertable | banned | 0 | 1 | 2
| {f,t} | {0.982,0.018}
|
| 0.9339
(3 rows)

=> select count(*) from usertable;
count
---------
1121190
(1 row)

> to see whether the estimate is failing on the basic podkey=20 part.
> If that doesn't seem too far off, add in the status = 2 and/or
> (NOT banned) parts to see what confuses it. I'd like to see the
> pg_stats rows for these three columns, too.
>

> BTW, you have done an ANALYZE recently on usertable, I hope.

Yeah, every week by cron job.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Antti Haapala 2003-06-17 18:42:08 Re: Sort memory not being released
Previous Message Michael Meskes 2003-06-17 18:20:34 Re: postgreSQL on NAS/SAN?