Re: count(*) and bad design was: Experiences with extensibility

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: count(*) and bad design was: Experiences with extensibility
Date: 2008-01-09 22:32:19
Message-ID: 60ejcqy6j0.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

zb(at)cybertec(dot)at (Zoltan Boszormenyi) writes:
> which will be fast and depending on the initial value of COUNT(*)
> it will be very close to the exact figure. You can extend the example
> with more columns if you know your SELECT COUNT(*) ... WHERE
> conditions in advance but this way you have to keep several administrative
> tables for different monitored tables. Again, this trades some disk space
> and INSERT/DELETE operation speed on the monitored tables for
> quicker count.

Actually, this approach will be Really Terrible for any cases where
multiple connections are adding/deleting tuples concurrently, as it
will force ALL updates to serialize behind the update to the central
table.

Occasionally, you'll have something even worse, namely a deadlock,
where two or more of the updates fighting over the single summary
tuple fall into a bad state, and one of them is forced to give up,
potentially rolling back its whole transaction.

[Waving hands for a moment]

What I would do *instead* would be for each INSERT to add a tuple with
a count of 1, and for each DELETE to add a tuple with count of -1, and
then to periodically have a single process walk through to summarize
the table. There may be a further optimization to be had by doing a
per-statement trigger that counts the number of INSERTs/DELETEs done,
so that inserting 30 tuples (in the table being tracked) leads to
adding a single tuple with count of 30 in the summary table.

That changes the tradeoffs, again...

- Since each INSERT/DELETE is simply doing an INSERT into the summary
table, the ongoing activity is *never* blocking anything

- You get the count by requesting
SELECT SUM(rec_cnt) as record_count from rec_count where tablename = 'foo';

- Note that the query is MVCC-consistent with the table!

- Once in a while, you'll want to run a single procedure that, for
each table, deletes all the existing records, and replaces them
with a single one consisting of the sum of the individual values.

- You can re-sync a table by running the query:
begin;
delete from record_count where tablename = 'foo';
insert into record_count(tablename, rec_cnt) select 'foo', (select count(*) from foo);
commit;
--
output = reverse("ofni.sesabatadxunil" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/languages.html
Rules of the Evil Overlord #174. "If I am dangling over a precipice
and the hero reaches his hand down to me, I will not attempt to pull
him down with me. I will allow him to rescue me, thank him properly,
then return to the safety of my fortress and order his execution."
<http://www.eviloverlord.com/>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Hervé Piedvache 2008-01-09 22:38:31 Re: Kernel kills postgres process - help need
Previous Message Tom Lane 2008-01-09 22:17:56 Re: Kernel kills postgres process - help need