Re: count(*) slow on large tables

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) slow on large tables
Date: 2003-10-04 23:33:46
Message-ID: m3u16ovaqt.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Quoth tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane):
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
>> We do have a TODO item:
>> * Consider using MVCC to cache count(*) queries with no WHERE clause
>
>> The idea is to cache a recent count of the table, then have
>> insert/delete add +/- records to the count. A COUNT(*) would get the
>> main cached record plus any visible +/- records. This would allow the
>> count to return the proper value depending on the visibility of the
>> requesting transaction, and it would require _no_ heap or index scan.
>
> ... and it would give the wrong answers. Unless the cache is somehow
> snapshot-aware, so that it can know which other transactions should be
> included in your count.

[That's an excellent summary that Bruce did of what came out of the
previous discussion...]

If this "cache" was a table, itself, the visibility of its records
should be identical to that of the visibility of the "real" records.
+/- records would become visible when the transaction COMMITed, at the
very same time the source records became visible.

I thought, at one point, that it would be a slick idea for "record
compression" to take place automatically; when you do a COUNT(*), the
process would include compressing multiple records down to one.
Unfortunately, that turns out to be Tremendously Evil if the same
COUNT(*) were being concurrently processed in multiple transactions.
Both would repeat much the same work, and this would ultimately lead
to one of the transactions aborting. [I recently saw this effect
occur, um, a few times...]

For this not to have Evil Effects on unsuspecting transactions, we
would instead require some process analagous to VACUUM, where a single
transaction would be used to compress the "counts table" down to one
record per table. Being independent of "user transactions," it could
safely compress the data without injuring unsuspecting transactions.

But in most cases, the cost of this would be pretty prohibitive.
Every transaction that adds a record to a table leads to a record
being added to table "pg_exact_row_counts". If transactions typically
involve adding ONE row to any given table, this effectively doubles
the update traffic. Ouch. That means that in a _real_
implementation, it would make sense to pick and choose the tables that
would be so managed.

In my earlier arguing of "You don't really want that!", while I may
have been guilty of engaging in a _little_ hyperbole, I was certainly
_not_ being facetious overall. At work, we tell the developers "avoid
doing COUNT(*) inside ordinary transactions!", and that is certainly
NOT facetious comment. I recall a case a while back where system
performance was getting brutalized by a lurking COUNT(*). (Combined
with some other pathological behaviour, of course!) And note that
this wasn't a query that the TODO item could address; it was of the
form "SELECT COUNT(*) FROM SOME_TABLE WHERE OWNER = VALUE;"

As you have commented elsewhere in the thread, much of the time, the
point of asking for COUNT(*) is often to get some idea of table size,
where the precise number isn't terribly important in comparison with
getting general magnitude. Improving the ability to get approximate
values would be of some value.

I would further argue that "SELECT COUNT(*) FROM TABLE" isn't
particularly useful even when precision _is_ important. If I'm
working on reports that would be used to reconcile things, the queries
I use are a whole lot more involved than that simple form. It is far
more likely that I'm using a GROUP BY.

It is legitimate to get wishful and imagine that it would be nice if
we could get the value of that query "instantaneously." It is also
legitimate to think that the effort required to implement that might
be better used on improving other things.
--
(reverse (concatenate 'string "ac.notelrac.teneerf" "@" "454aa"))
http://www3.sympatico.ca/cbbrowne/
"very few people approach me in real life and insist on proving they
are drooling idiots." -- Erik Naggum, comp.lang.lisp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-10-04 23:44:40 Re: pgsql-server/ oc/src/sgml/runtime.sgml rc/back ...
Previous Message Andrew Dunstan 2003-10-04 23:15:15 Re: [HACKERS] initdb

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2003-10-05 04:20:32 Re: COUNT(*) again (was Re: [HACKERS] Index/Function organized
Previous Message Tom Lane 2003-10-04 21:15:24 Re: COUNT(*) again (was Re: [HACKERS] Index/Function organized table layout)