Re: 8.1 count(*) distinct: IndexScan/SeqScan

From: Pailloncy Jean-Gerard <jg(at)rilk(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.1 count(*) distinct: IndexScan/SeqScan
Date: 2005-12-06 16:19:34
Message-ID: EF48EB59-0D18-4010-AE23-52CE1E28954A@rilk.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

After few test, the difference is explained by the
effective_cache_size parameter.

with effective_cache_size=1000 (default)
the planner chooses the following plan
postgres=# explain select count(*) from (select distinct on (val) *
from test) as foo;
QUERY PLAN
------------------------------------------------------------------------
--------
Aggregate (cost=421893.64..421893.65 rows=1 width=0)
-> Unique (cost=385193.48..395679.24 rows=2097152 width=8)
-> Sort (cost=385193.48..390436.36 rows=2097152 width=8)
Sort Key: test.val
-> Seq Scan on test (cost=0.00..31252.52
rows=2097152 width=8)
(5 rows)

with effective_cache_size=15000
the planner chooses the following plan
postgres=# explain select count(*) from (select distinct on (val) *
from test) as foo;
QUERY PLAN
------------------------------------------------------------------------
------------------
Aggregate (cost=101720.39..101720.40 rows=1 width=0)
-> Unique (cost=0.00..75505.99 rows=2097152 width=8)
-> Index Scan using testval on test (cost=0.00..70263.11
rows=2097152 width=8)
(3 rows)

I test some other values for effective_cache_size.
The switch from seq to index scan happens between 9900 and 10000 for
effective_cache_size.

I have my sql server on a OpenBSD 3.8 box with 1 Gb of RAM with
nothing else running on it.
I setup the cachepercent to 25. I expect to have 25% of 1 Gb of RAM
(256 Mb) as file cache.
effective_cache_size=15000 means 15000 x 8K of OS cache = 120,000 Kb
which is lower than my 256 MB of disk cache.

I recall the result of my precedent test.
#rows 2097152
IndexScan 1363396,581s
SeqScan 98758,445s
Ratio 13,805
So the planner when effective_cache_size=15000 chooses a plan that is
13 times slower than the seqscan one.

I did not understand where the problem comes from.
Any help welcome.

Cordialement,
Jean-Gérard Pailloncy

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Joshua Kramer 2005-12-06 16:47:44 TSearch2 vs. Apache Lucene
Previous Message Ameet Kini 2005-12-06 16:14:09 postgresql performance tuning