Re: [RFC] speed up count(*)

From: Joe Conway <mail(at)joeconway(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] speed up count(*)
Date: 2021-10-21 13:09:26
Message-ID: fa2688b8-c479-6e3d-f40d-3a46d4474846@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/20/21 2:33 PM, John Naylor wrote:
>
> On Wed, Oct 20, 2021 at 2:23 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> wrote:
> >
> > Couldn't we simply inspect the visibility map, use the index data only
> > for fully visible/summarized ranges, and inspect the heap for the
> > remaining pages? That'd still be a huge improvement for tables with most
> > only a few pages modified recently, which is a pretty common case.
> >
> > I think the bigger issue is that people rarely do COUNT(*) on the whole
> > table. There are usually other conditions and/or GROUP BY, and I'm not
> > sure how would that work.
>
> Right. My (possibly hazy) recollection is that people don't have quite
> as high an expectation for queries with more complex predicates and/or
> grouping. It would be interesting to see what the balance is.

I think you are exactly correct. People seem to understand that with a
predicate it is harder, but they expect

select count(*) from foo;

to be nearly instantaneous, and they don't really need it to be exact.
The stock answer for that has been to do

select reltuples from pg_class
where relname = 'foo';

But that is unsatisfying because the problem is often with some
benchmark or another that cannot be changed.

I'm sure this idea will be shot down in flames <donning flameproof
suit>, but what if we had a default "off" GUC which could be turned on
causing the former to be transparently rewritten into the latter
</donning flameproof suit>?

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-10-21 13:16:58 Re: Added schema level support for publication.
Previous Message Masahiko Sawada 2021-10-21 11:54:59 Re: [Bug] Logical Replication failing if the DateStyle is different in Publisher & Subscriber