Re: Seqscan in MAX(index_column)

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Seqscan in MAX(index_column)
Date: 2003-09-05 03:44:31
Message-ID: m3oexz3nmo.fsf@chvatal.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oops! pgman(at)candle(dot)pha(dot)pa(dot)us (Bruce Momjian) was seen spray-painting on a wall:
> Neil Conway wrote:
>> On Thu, 2003-09-04 at 22:02, Bruce Momjian wrote:
>> > My idea is that if a transaction doing a COUNT(*) would first look to
>> > see if there already was a visible cached value, and if not, it would do
>> > the COUNT(*) and insert into the cache table. Any INSERT/DELETE would
>> > remove the value from the cache. As I see it, the commit of the
>> > INSERT/DELETE transaction would then auto-invalidate the cache at the
>> > exact time the transaction commits. This would allow MVCC visibility of
>> > the counts.
>>
>> But this means that some of the time (indeed, *much* of the time),
>> COUNT(*) would require a seqscan of the entire table. Since at many
>> sites that will take an enormous amount of time (and disk I/O),
>> that makes this solution infeasible IMHO.
>>
>> In general, I don't think this is worth doing.
>
> It is possible it isn't worth doing. Can the INSERT/DELETE
> incrementing/decrementing the cached count work reliabily?

Wouldn't this more or less be the same thing as having a trigger that
does, upon each insert/delete "update pg_counts set count = count + 1
where reltable = 45232;"? (... where 1 would be -1 for deletes, and where
45232 is the OID of the table...)

Technically, it seems _feasible_, albeit with the problem that it
turns pg_counts into a pretty horrid bottleneck. If lots of backends
are updating that table, then row 45232 in pg_counts becomes
troublesome because all those processes have to serialize around
updating it.

And if I have tables where I insert lots of data, but couldn't care
less how many rows they have, this effort is wasted.

When I was curious as to how COUNT might be maintained, I was pretty
sure that this wouldn't be the preferred method...
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://cbbrowne.com/info/emacs.html
:FATAL ERROR -- ERROR IN ERROR HANDLER

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2003-09-05 03:49:23 Re: Seqscan in MAX(index_column)
Previous Message Tom Lane 2003-09-05 03:32:24 Re: Seqscan in MAX(index_column)