Re: select count(*) from anIntColumn where int_value = 0;

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: select count(*) from anIntColumn where int_value = 0;
Date: 2004-02-11 14:14:39
Message-ID: m3n07p7lf4.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Oops! stehule(at)kix(dot)fsv(dot)cvut(dot)cz (Pavel Stehule) was seen spray-painting on a wall:
>
> Regards
> Pavel Stehule
>
> On Wed, 11 Feb 2004, David Teran wrote:
>
>> Hi
>>
>> we have a table with about 4 million rows. One column has an int value,
>> there is a btree index on it. We tried to execute the following
>> statement and it is very slow on a dual G5 2GHZ with 4 GB of RAM.
>>
>> explain analyze select count(*) from job_property where int_value = 0;
>>
>> Aggregate (cost=144348.80..144348.80 rows=1 width=0) (actual
>> time=13536.852..13536.852 rows=1 loops=1)
>> -> Seq Scan on job_property (cost=0.00..144255.15 rows=37459
>> width=0) (actual time=19.422..13511.653 rows=42115 loops=1)
>> Filter: (int_value = 0)
>> Total runtime: 13560.862 ms
>>
> If you has index on id, then you can use
> SELECT id FROM tabulka ORDER BY id DESC LIMIT 1;
>
> See 4.8. FAQ

I'm afraid that's not the answer. That would be the faster
alternative to "select max(id) from tabulka;"

I guess the question is, is there a faster way of coping with the
"int_value = 0" part?

It seems a little odd that the index was not selected; it appears that
the count was 42115, right?

The estimated number of rows was 37459, and if the table size is ~4M,
then I would have expected the query optimizer to use the index.

Could you try doing "ANALYZE JOB_PROPERTY;" and then try again?

One thought that comes to mind is that perhaps the statistics are
outdated.

Another thought is that perhaps there are several really common
values, and the statistics are crummy. You might relieve that by:

alter table job_property alter column int_value set statistics 20;
analyze job_property;

(Or perhaps some higher value...)

If there are a few very common discrete values in a particular field,
then the default statistics may get skewed because the histogram
hasn't enough bins...
--
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/wp.html
Rules of the Evil Overlord #102. "I will not waste time making my
enemy's death look like an accident -- I'm not accountable to anyone
and my other enemies wouldn't believe it.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message alemon 2004-02-11 14:23:31 slow database
Previous Message stefan bogdan 2004-02-11 14:08:07 update performance