Re: count(*) slow on large tables

From: Sean Chittenden <sean(at)chittenden(dot)org>
To: Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) slow on large tables
Date: 2003-10-06 17:01:36
Message-ID: 20031006170136.GB94718@perrin.nxad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

> How it will help? This is in addition to trigger proposal that came
> up earlier. With triggers it's not possible to make values visible
> across backends unless trigger updates a table, which eventually
> leads to vacuum/dead tuples problem.
>
> 1. User creates a trigger to check updates/inserts for certain conditions.
> 2. It updates the count as and when required.
> 3. If the trigger detects the count is not initialized, it would issue the
> same query first time. There is no avoiding this issue.
>
> Besides providing facility of resident variables could be used
> imaginatively as well.
>
> Does this make sense? IMO this is more generalised approach over all.

I do this _VERY_ frequently in my databases, only I have my stored
procs do the aggregate in a predefined MVCC table that's always there.
Here's a denormalized version for public consumption/thought:

CREATE TABLE global.dba_aggregate_cache (
dbl TEXT NOT NULL, -- The database location, doesn't need to be
-- qualified (ex: schema.table.col)
op TEXT NOT NULL, -- The operation, SUM, COUNT, etc.
qual TEXT, -- Any kind of conditional, such as a where clause
val_int INT, -- Whatever the value is, of type INT
val_bigint BIGINT, -- Whatever the value is, of type BIGINT
val_text TEXT, -- Whatever the value is, of type TEXT
val_bytea BYTEA, -- Whatever the value is, of type BYTEA
);
CREATE UNIQUE INDEX dba_aggregate_cache_dbl_op_udx ON global.dba_aggregate_cache(dbl,op);

Then, I use a function to retrieve this value instead of a SELECT
COUNT(*).

SELECT public.cache_count('dbl','qual'); -- In this case, the op is COUNT
SELECT public.cache_count('dbl'); -- Returns the COUNT for the table listed in the dbl

Then, I create 4 or 5 functions (depends on the op I'm performing):

1) A private function that _doesn't_ run as security definer, that
populates the global.dba_aggregate_cache row if it's empty.
2) A STABLE function for SELECTs, if the row doesn't exist, then it
calls function #1 to populate its existence.
3) A STABLE function for INSERTs, if the row doesn't exist, then it
calls function #1 to populate its existence, then adds the
necessary bits to make it accurate.
4) A STABLE function for DELETEs, if the row doesn't exist, then it
calls function #1 to populate its existence, then deletes the
necessary bits to make it accurate.
5) A STABLE function for UPDATEs, if the row doesn't exist, then it
calls function #1 to populate its existence, then updates the
necessary bits to make it accurate. It's not uncommon for me to
not have an UPDATE function/trigger.

Create triggers for functions 2-5, and test away. It's MVCC,
searching through a table that's INDEX'ed for a single row is
obviously vastly faster than a seqscan/aggregate. If I need any kind
of an aggregate to be fast, I use this system with a derivation of the
above table. The problem with it being that I have to retrain others
to use cache_count(), or some other function instead of using
COUNT(*).

That said, it'd be nice if there were a way to tell PostgreSQL to do
the above for you and teach COUNT(*), SUM(*), or other aggregates to
use an MVCC backed cache similar to the above. If people want their
COUNT's to be fast, then they have to live with the INSERT, UPDATE,
DELETE cost. The above doesn't work with anything complex such as
join's, but it's certainly a start and I think satisfies everyone's
gripes other than the tuple churn that _does_ happen (*nudge nudge*,
pg_autovacuum could be integrated into the backend to handle this).
Those worried about performance, the pages that are constantly being
recycled would likely stay in disk cache (PG or the OS). There's
still some commit overhead, but still... no need to over optimize by
requiring the table to be stored in the out dated, slow, and over used
shm (also, *nudge nudge*).

Anyway, let me throw that out there as a solution that I use and it
works quite well. I didn't explain the use of the qual column, but I
think those who grasp the above way of handling things probably grok
how to use the qual column in a dynamically executed query.

CREATE AGGREGATE CACHE anyone?

-sc

--
Sean Chittenden

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2003-10-06 17:21:43 Re: max_connections/shared_buffers (was Re: Beta4 Tag'd
Previous Message Bruce Momjian 2003-10-06 16:34:53 Re: pg_restore -d doesn't display output

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff 2003-10-06 17:58:57 SOlaris updates
Previous Message James Rogers 2003-10-06 16:55:38 Seqscan buffer promotion (was: reindex/vacuum locking/performance?)