Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Nonrandom scanned_pages distorts pg_class.reltuples set by VACUUM
Date: 2022-02-13 21:43:59
Message-ID: CAH2-WzkNKORurux459M64mR63Aw4Jq7MBRVcX=CvALqN3A88WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

My recent commit 44fa8488 made vacuumlazy.c always scan the last page
of every heap relation, even when it's all-visible, regardless of any
other factor. The general idea is to always examine the last page in
the relation when that might prevent us from unsuccessfully attempting
to truncate the relation -- since even the attempt is costly (in
particular, we have to get an AccessExclusiveLock, which is painful
with a Hot Standby replica). This created a new problem: if we
manually VACUUM a smaller, static table many times,
vac_estimate_reltuples() will consider the same scanned_page to be a
random sample each time. Of course this page is the further possible
thing from a random sample, because it's the same heap page each time,
and because it's generally more likely that the last page will have
fewer tuples.

Every time you run VACUUM (without changing the table),
pg_class.reltuples shrinks, by just a small amount. Run VACUUM like
that enough times, and pg_class.reltuples will eventually become quite
distorted. It's easy to spot this effect, just by running "VACUUM
VERBOSE" in a loop against a static table, and noting how "tuples:
.... ??? remain" tends to shrink over time, even though the contents
of the table won't have changed.

The "always scan last page" behavior isn't really new, actually. It's
closely related to an earlier, slightly different behavior with the
same goal -- though one that conditioned scanning the last page on
certain other factors that were protective (which was probably never
considered by the design). This closely related behavior (essentially
the same behavior) was added by commit e8429082 from 2015. I judged
that it wasn't worth worrying about scanning an extra page
unnecessarily (better to keep things simple), since it's not at all
uncommon for VACUUM to scan a few all-visible pages anyway (due to the
SKIP_PAGES_THRESHOLD stuff). And I haven't changed my mind about that
-- I still think that we should just scan the last page in all cases,
to keep things simple. But I definitely don't think that this
"pg_class.reltuples gets distorted over time" business is okay -- that
has to be fixed.

There has been at least one very similar bug in the past: the commit
message from commit b4b6923e (from 2011) talks about a similar issue
with over-sampling from the beginning of the table (not the end), due
to a similar bias involving visibility map implementation details that
lead to an extremely nonrandom scanned_pages sample. To me that
suggests that the true problem here isn't really in vacuumlazy.c --
going back to the old approach (i.e. scanning the last page
conditionally) seems like a case of the tail wagging the dog. IMV the
real problem is that vac_estimate_reltuples() is totally unconcerned
about these kinds of systematic sampling problems.

Attached draft patch fixes the issue by making
vac_estimate_reltuples() return the old/original reltuples from
pg_class (and not care at all about the scanned_tuples value from its
vacuumlazy.c caller) when:

1. The total_pages for the table hasn't changed (by even one page)
since the last VACUUM (or last ANALYZE).

and,

2. The total number of scanned_pages is less than 2% of total_pages.

This fixes the observed problem directly. It also makes the code
robust against other similar problems that might arise in the future.
The risk that comes from trusting that scanned_pages is a truly random
sample (given these conditions) is generally very large, while the
risk that comes from disbelieving it (given these same conditions) is
vanishingly small.

--
Peter Geoghegan

Attachment Content-Type Size
v1-0001-Avoid-vac_estimate_reltuples-distortion.patch application/octet-stream 1.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-02-13 21:53:19 Re: Adding CI to our tree
Previous Message Nikolay Shaplov 2022-02-13 21:43:36 [PATCH] New [relation] option engine