Re: Bad query plan inside EXISTS clause

From: Benoit Delbosc <bdelbosc(at)nuxeo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad query plan inside EXISTS clause
Date: 2010-03-10 16:59:38
Message-ID: 4B97CFFA.6030000@nuxeo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane a écrit :
> Benoit Delbosc <bdelbosc(at)nuxeo(dot)com> writes:
>> I am trying to understand why inside an EXISTS clause the query planner
>> does not use the index:
>
> I'm not sure this plan is as bad as all that. The key point is that the
> planner is expecting 52517 rows that match that users_md5 value (and the
> true number is evidently 51446, so that estimate isn't far off). That's
> about 1/48th of the table. It knows that the EXISTS case can stop as
> soon as it finds one match, so it's betting that a plain seqscan will
> hit a match faster than an index lookup would be able to, ie,
> seqscanning about 48 tuples is faster than one index lookup. This might
> be a bad bet if the users_md5 values are correlated with physical order,
> ie the matches are not randomly scattered but are all towards the end of
> the table.
exact, the data is not randomly scattered but ordered this explains why
in my case seq scan is a bad bet

Barring that, though, it could be a good bet if the table
> isn't swapped in. Which is what the default cost parameters are set
> up to assume.
there are lots of shared buffers and effective memory on this instance,
the query is executed many times I can assume that the table isn't
swapped in right ?

> I suspect your real complaint is that you expect the table to be swapped
> in, in which case what you ought to be doing is adjusting the planner's
> cost parameters. Some playing around here with a similar case suggests
> that even a small reduction in random_page_cost would make it prefer an
> indexscan for this type of situation.
excellent !

Changing the random_page_cost from 4 to 2 do the trick

SET random_page_cost = 2;
EXPLAIN ANALYZE SELECT 1 WHERE EXISTS (SELECT 1 FROM read_acls_cache
WHERE users_md5 = '9bc9012eb29c0bb2ae3cc7b5e78c2acf');


QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
Result (cost=1.06..1.07 rows=1 width=0) (actual time=0.048..0.048
rows=1 loops=1)
One-Time Filter: $0
InitPlan 1 (returns $0)
-> Index Scan using read_acls_cache_users_md5_idx on
read_acls_cache (cost=0.00..55664.21 rows=52517 width=0) (actual
time=0.045..0.045 rows=1 loops=1)
Index Cond: ((users_md5)::text =
'9bc9012eb29c0bb2ae3cc7b5e78c2acf'::text)
Total runtime: 0.087 ms
(6 rows)

466/0.087 = 5360 thanks !

kind regards

ben

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2010-03-10 21:55:32 Re: Strange workaround for slow query
Previous Message Harald Fuchs 2010-03-10 16:29:25 Re: Strange workaround for slow query