Re: count(*) performance improvement ideas

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: count(*) performance improvement ideas
Date: 2008-03-12 15:44:10
Message-ID: 47D7FA4A.2060109@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavan Deolasee wrote:
> I am reading discussion about improving count(*) performance. I have
> also seen a TODO for this.
>
> Many people have suggested TRIGGER based solution to the slow count(*)
> problem. I looked at the following link which presents the solution
> neatly.
>
> http://www.varlena.com/GeneralBits/120.php
>
> But how does that really work for SERIALIZABLE transactions ? If
> two concurrent transactions INSERT/DELETE rows from a table,
> the trigger execution of one of the transactions is bound to fail
> because of concurrent access. Even for READ COMMITTED transactions,
> the trigger execution would wait if the other transaction has executed
> the trigger on the same table. Well, I think the READ COMMITTED case
> can be handled with DEFERRED triggers, but that may require queuing up
> too many triggers if there are many inserts/deletes in a transaction.
>
> Running trigger for every insert/delete seems too expensive. I wonder
> if we can have a separate "counter" table (as suggested in the TRIGGER
> based solution) and track total number of tuples inserted and deleted
> in a transaction (and all the committed subtransactions). We then
> execute a single UPDATE at the end of the transaction. With HOT,
> updating the "counter" table should not be a big pain since all these
> updates can potentially be HOT updates. Also since the update of
> the "counter" table happens at the commit time, other transactions
> inserting/deleting from the same user table may need to wait for a
> very small period on the "counter" table tuple.
>
> This still doesn't solve the serializable transaction problem
> though. But I am sure we can figure out some solution for that case
> as well if we agree on the general approach.
>
> I am sure this must have been discussed before. So what are the
> objections

If you are talking about automatically doing this for every table - I
have an objection that the performance impact seems unwarranted against
the gain. We are still talking about every insert or update updating
some counter table, with the only mitigating factor being that the
trigger would be coded deeper into PostgreSQL theoretically making it
cheaper?

You can already today create a trigger on insert that will append to a
summary table of some sort, whose only purpose is to maintain counts. At
the simplest, it is as somebody else suggested where you might have the
other table only store the primary keys with foreign key references back
to the main table for handling deletes and updates. Storing transaction
numbers and such might allow the data to be reduced in terms of size
(and therefore elapsed time to scan), but it seems complex.

If this really is a problem that must be solved - I prefer the
suggestion from the past of keeping track of live rows per block for a
certain transaction range, and any that fall within this range can check
off this block quickly with an exact count, then the exceptional blocks
(the ones being changed) can be scanned to be sure. But, it's still
pretty complicated to implement right and maintain, for what is probably
limited gain. I don't personally buy into the need to do exact count(*)
on a whole table quickly. I know people ask for it - but I find these
same people either confused, or trying to use this functionality to
accomplish some other end, under the assumption that because they can
get counts faster from other databases, therefore PostgreSQL should do
it as well. I sometimes wonder whether these people would even notice if
PostgreSQL translated count(*) on the whole table to query reltuples. :-)

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-03-12 15:47:49 Re: TODO-list on wiki (was: TODO update about SQLSTATE to PGconn)
Previous Message Magnus Hagander 2008-03-12 15:43:21 Re: TODO-list on wiki (was: TODO update about SQLSTATE to PGconn)