VACUUM and ANALYZE disagreeing on what reltuples means

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: VACUUM and ANALYZE disagreeing on what reltuples means
Date: 2017-07-24 22:47:36
Message-ID: 16db4468-edfa-830a-f921-39a50498e77e@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

It seems to me that VACUUM and ANALYZE somewhat disagree on what exactly
reltuples means. VACUUM seems to be thinking that

reltuples = live + dead

while ANALYZE apparently believes that

reltuples = live

This causes somewhat bizarre changes in the value, depending on which of
those commands was executed last.

To demonstrate the issue, let's create a simple table with 1M rows,
delete 10% rows and then we'll do a bunch of VACUUM / ANALYZE and check
reltuples, n_live_tup and n_dead_tup in the catalogs.

I've disabled autovacuum so that it won't interfere with this, and
there's another transaction blocking VACUUM from actually cleaning any
dead tuples.

test=# create table t as
select i from generate_series(1,1000000) s(i);

test=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
1e+06 | 1000000 | 0

So, that's nice. Now let's delete 10% of rows, and run VACUUM and
ANALYZE a few times.

test=# delete from t where random() < 0.1;

test=# vacuum t;

test=# select reltuples, n_live_tup, n_dead_tup
from pg_stat_user_tables join pg_class using (relname)
where relname = 't';

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
1e+06 | 900413 | 99587

test=# analyze t;

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
900413 | 900413 | 99587

test=# vacuum t;

reltuples | n_live_tup | n_dead_tup
-----------+------------+------------
1e+06 | 900413 | 99587

So, analyze and vacuum disagree.

To further confuse the poor DBA, VACUUM always simply ignores the old
values while ANALYZE combines the old and new values on large tables
(and converges to the "correct" value after a few steps). This table is
small (less than 30k pages), so ANALYZE does not do that.

This is quite annoying, because people tend to look at reltuples while
investigating bloat (e.g. because the check_postgres query mentioned on
our wiki [1] uses reltuples in the formula).

[1] https://wiki.postgresql.org/wiki/Show_database_bloat

And when the cleanup is blocked for some reason (as in the example
above), VACUUM tends to be running much more often (because it can't
cleanup anything). So reltuples tend to be set to the higher value,
which I'd argue is the wrong value for estimating bloat.

I haven't looked at the code yet, but I've confirmed this happens both
on 9.6 and 10. I haven't checked older versions, but I guess those are
affected too.

The question is - which of the reltuples definitions is the right one?
I've always assumed that "reltuples = live + dead" but perhaps not?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-07-24 22:55:45 Re: VACUUM and ANALYZE disagreeing on what reltuples means
Previous Message Tom Lane 2017-07-24 22:16:47 Re: Fwd: Syncing sql extension versions with shared library versions