Re: Heavily modified big table bloat even in auto vacuum is running

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Heavily modified big table bloat even in auto vacuum is running
Date: 2014-01-19 00:29:37
Message-ID: 27056.1390091377@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> writes:
> I am marking this (based on patch vacuum_fix_v7_nkeep.patch) as Ready
> For Committer.

I've reviewed and committed this patch, with one significant change.
If you look at the way that vacuumlazy.c computes new_rel_tuples, it's
based on scanned_tuples (lazy_scan_heap's num_tuples), which is the total
number of surviving tuples *including* the recently-dead ones counted in
nkeep. This is the number that we want to put into pg_class.reltuples,
I think, but it's wrong for the pgstats stuff to use it as n_live_tuples
if we're going to count the recently-dead ones as dead. That is, if we're
improving the approximation that n_dead_tuples is zero after a vacuum,
the fix should involve reducing the n_live_tuples estimate as well as
increasing the n_dead_tuples estimate.

Using your test script against the unpatched code, it's easy to see that
there's a large (and wrong) value of n_live_tup reported by an autovacuum,
which gets corrected by the next autoanalyze. For instance note these
successive readouts from the pg_stat_all_tables query:

n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup
------------+------------------+-------------------+------------
497365 | 9 | 8 | 4958346
497365 | 9 | 8 | 5458346
1186555 | 10 | 8 | 0
1186555 | 10 | 8 | 500000
499975 | 10 | 9 | 2491877

Since we know the true number of live tuples is always exactly 500000
in this test, that jump is certainly wrong. With the committed patch,
the behavior is significantly saner:

n_live_tup | autovacuum_count | autoanalyze_count | n_dead_tup
------------+------------------+-------------------+------------
483416 | 2 | 2 | 5759861
483416 | 2 | 2 | 6259861
655171 | 3 | 2 | 382306
655171 | 3 | 2 | 882306
553942 | 3 | 3 | 3523744

Still some room for improvement, but it's not so silly anymore.

It strikes me that there may be an obvious way to improve the number
further, based on the observation in this thread that nkeep doesn't need
to be scaled up because VACUUM should have scanned every page that could
contain dead tuples. Namely, that we're arriving at new_rel_tuples by
scaling up num_tuples linearly --- but perhaps we should only scale up
the live-tuples fraction of that count, not the dead-tuples fraction.
By scaling up dead tuples too, we are presumably still overestimating
new_rel_tuples somewhat, and the behavior that I'm seeing with this test
script seems to confirm that. I haven't time to pursue this idea at the
moment, but perhaps someone else would like to.

The n_dead_tup values seem to still be on the high side (not the low
side) when I run this test. Not too sure why that is.

Also, I don't see any particularly meaningful change in the rate of
autovacuuming or autoanalyzing when using default postgresql.conf
settings. I see that your test case changes the autovac parameters quite
a bit, but I didn't bother installing those values here. This may or may
not mean much; the fix is clearly correct on its own terms.

On the whole, this patch is not really addressing the question of
accounting for transactions changing the table concurrently with
VACUUM; it's only fixing the impedance mismatch between pgstats wanting
to count live and dead tuples separately while VACUUM wasn't telling
it that. That's a good thing to do, but I think we still have some
issues here.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-01-19 01:12:30 Re: Patch to add support of "IF NOT EXISTS" to others "CREATE" statements
Previous Message Marti Raudsepp 2014-01-19 00:14:32 [patch] Potential relcache leak in get_object_address_attribute