Skip site navigation (1) Skip section navigation (2)

Re: count(*) slow on large tables

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Christopher Browne <cbbrowne(at)libertyrms(dot)info>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) slow on large tables
Date: 2003-10-03 05:13:08
Message-ID: 87brsyrjiz.fsf@stark.dyndns.tv (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Christopher Browne <cbbrowne(at)libertyrms(dot)info> writes:

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
> 
> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

Well it would be handy for a few other cases as well. 

1 It would be useful for the case where you have a partial index with a
  matching where clause. The optimizer already considers using such indexes
  but it has to pay the cost of the tuple lookup, which is substantial.

2 It would be useful for the very common queries of the form 
  WHERE x IN (select id from foo where some_indexed_expression)

  (Or the various equivalent forms including outer joins that test to see if
  the matching record was found and don't retrieve any other columns in the
  select list.)

3 It would be useful for many-many relationships where the intermediate table
  has only the two primary key columns being joined. If you create a
  multi-column index on the two columns it shouldn't need to look up the
  tuple. This would be effectively be nearly equivalent to an "index organized
  table".


4 It would be useful for just about all the referential integrity queries...


I don't mean to say this is definitely a good thing. The tradeoff in
complexity and time to maintain the index pages would be large. But don't
dismiss it as purely a count(*) optimization hack.

I know Oracle is capable of it and it can speed up your query a lot when you
remove that last unnecessary column from a join table allowing oracle to skip
the step of reading the table.

-- 
greg


In response to

pgsql-performance by date

Next:From: Josh BerkusDate: 2003-10-03 05:27:08
Subject: Re: TPC-R benchmarks
Previous:From: Dror MatalonDate: 2003-10-03 04:27:54
Subject: Re: count(*) slow on large tables

pgsql-hackers by date

Next:From: Shridhar DaithankarDate: 2003-10-03 06:29:02
Subject: Re: count(*) slow on large tables
Previous:From: Greg StarkDate: 2003-10-03 04:50:12
Subject: Re: minor view creation weirdness

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group