From: | Joseph Shraibman <jks(at)selectacast(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgreSQL(dot)org |
Subject: | Re: order of nested loop |
Date: | 2003-06-17 18:30:41 |
Message-ID: | 3EEF5E51.4030306@selectacast.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Tom Lane wrote:
> Joseph Shraibman <jks(at)selectacast(dot)net> writes:
>
>>The planner shows this for the scan on d:
>>(cost=0.00..2380577.42 rows=525568 width=49)
>>Maybe it thinks it will reach the limit of 25 before it actually does,
>>which is why it is willing to try something so expensive?
>
>
> Yeah, exactly, it's extrapolating that it will only actually have to
> process a relatively small part of that scan. Which would be true if
> it were getting 4492 rows out of the join per estimate, and not just 1
> per reality. This is the same estimation failure as in the other plan,
> I think, but it's a lot simpler to see in the other plan.
>
>
>>... Thus it would make sense to first get the entries in u,
>>filter them, then filter by their status in d.
>
>
> Right, but the problem is to know how many u entries will get through
> the filter. When that estimate is off by a factor of ~5000, it's no
> surprise that the planner is likely to choose the wrong plan. If you
> could cut that estimation error by even a factor of 2, it would have
> made the right choices here.
>
> So we're back to my previous question: why is that estimate so far off?
> You might try comparing
> explain select * from usertable where podkey = 20;
> select count(*) from usertable where podkey = 20;
=> explain select * from usertable where podkey = 20;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Index Scan using usertable_podkey_key on usertable
(cost=0.00..16019.99 rows=5923 width=625)
Index Cond: (podkey = 20)
(2 rows)
=> select count(*) from usertable where podkey = 20;
count
-------
3
(1 row)
=> select * from pg_stats where tablename = 'usertable' and attname
in('podkey','status','banned');
schemaname | tablename | attname | null_frac | avg_width | n_distinct
| most_common_vals |
most_common_freqs |
histogram_bounds | correlation
------------+-----------+---------+-----------+-----------+------------+------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------+-------------
public | usertable | podkey | 0 | 4 | 66
| {<actual numbers deleted, but 20 isn't on of them>} |
{0.208,0.156,0.112,0.0696667,0.0306667,0.028,0.0273333,0.025,0.0243333,0.023}
| {10,90,137,140,197,207,246,264,267,269,300} | 0.53816
public | usertable | status | 0 | 2 | 4
| {2,4,1,3} |
{0.938,0.0496667,0.0103333,0.002}
| | 0.840237
public | usertable | banned | 0 | 1 | 2
| {f,t} | {0.982,0.018}
|
| 0.9339
(3 rows)
=> select count(*) from usertable;
count
---------
1121190
(1 row)
> to see whether the estimate is failing on the basic podkey=20 part.
> If that doesn't seem too far off, add in the status = 2 and/or
> (NOT banned) parts to see what confuses it. I'd like to see the
> pg_stats rows for these three columns, too.
>
> BTW, you have done an ANALYZE recently on usertable, I hope.
Yeah, every week by cron job.
From | Date | Subject | |
---|---|---|---|
Next Message | Antti Haapala | 2003-06-17 18:42:08 | Re: Sort memory not being released |
Previous Message | Michael Meskes | 2003-06-17 18:20:34 | Re: postgreSQL on NAS/SAN? |