Re: Select max(foo) and select count(*) optimization

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
Cc: Shridhar Daithankar <shridhar_daithankar(at)persistent(dot)co(dot)in>, Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Select max(foo) and select count(*) optimization
Date: 2004-01-06 23:19:55
Message-ID: 2698.1073431195@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"D'Arcy J.M. Cain" <darcy(at)druid(dot)net> writes:
> In any case, if I have to vacuum a 20,000,000 row table to get an accurate
> count then I may as well run count(*) on it.
> (*): Actually I only analyze but I understand that that should be sufficient.

ANALYZE without VACUUM will deliver a not-very-accurate estimate, since it
only looks at a sample of the table's pages and doesn't grovel through
every one. Any of the VACUUM variants, on the other hand, will set
pg_class.reltuples reasonably accurately (as the number of rows actually
seen and left undeleted by the VACUUM pass).

There are pathological cases where ANALYZE's estimate of the overall row
count can be horribly bad --- mainly, when the early pages of the table
are empty or nearly so, but there are well-filled pages out near the
end. I have a TODO item to try to make ANALYZE less prone to getting
fooled that way...

regards, tom lane

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message D. Dante Lorenso 2004-01-07 16:57:38 Find original number of rows before applied LIMIT/OFFSET?
Previous Message Tom Lane 2004-01-06 23:12:35 Re: PgAdmin startup query VERY slow