Re: Removing vacuum_cleanup_index_scale_factor

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Removing vacuum_cleanup_index_scale_factor
Date: 2021-03-08 22:35:03
Message-ID: CAH2-Wznw+Cc=DXctr21rWFh+CiCcAUn+Q+Hkw=xKbGiYRZrNYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 8, 2021 at 1:38 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> As you say, the history here is a bit convoluted, but it seems like
> a good principle to avoid interconnections between VACUUM and ANALYZE
> as much as we can. I haven't been paying enough attention to this
> thread to have more insight than that.

The attached patch does what I proposed earlier today: it teaches
do_analyze_rel() to always set pg_class.reltuples for indexes when it
would do the same thing for the heap/table relation already. It's now
uniform in that sense.

Also included is a patch that removes the
vacuum_cleanup_index_scale_factor mechanism for triggering an index
scan during VACUUM -- that's what the second patch does (this depends
on the first patch, really).

Do you think that a backpatch to Postgres 13 for both of these patches
would be acceptable? There are two main concerns that I have in mind
here, both of which are only issues in Postgres 13:

1. Arguably the question of skipping scanning the index should have been
considered by the autovacuum_vacuum_insert_scale_factor patch when it
was committed for Postgres 13 -- but it wasn't. There is a regression
that was tied to autovacuum_vacuum_insert_scale_factor in Postgres 13
by Mark Callaghan:

https://smalldatum.blogspot.com/2021/01/insert-benchmark-postgres-is-still.html

The blog post says: "Updates - To understand the small regression
mentioned above for the l.i1 test (more CPU & write IO) I repeated the
test with 100M rows using 2 configurations: one disabled index
deduplication and the other disabled insert-triggered autovacuum.
Disabling index deduplication had no effect and disabling
insert-triggered autovacuum resolves the regression."

I think that this regression is almost entirely explainable by the
need to unnecessarily scan indexes for autovacuum VACUUMs that just
need to set the visibility map. This issue is basically avoidable,
just by removing the vacuum_cleanup_index_scale_factor cleanup-only
VACUUM criteria (per my second patch).

2. I fixed a bug in nbtree deduplication btvacuumcleanup() stats in
commit 48e12913. This fix still left things in kind of a bad state:
there are still cases where the btvacuumcleanup()-only VACUUM case
will set pg_class.reltuples to a value that is significantly below
what it should be (it all depends on how effective deduplication is
with the data). These remaining cases are effectively fixed by the
second patch.

I probably should have made btvacuumcleanup()-only VACUUMs set
"stats->estimate_count = true" when I was working on the fix that
became commit 48e12913. Purely because my approach was inherently
approximate with posting list tuples, and so shouldn't be trusted for
anything important (num_index_tuples is suitable for VACUUM VERBOSE
output only in affected cases). I didn't set "stats->estimate_count =
true" in affected cases because I was worried about unforeseen
consequences. But this seems defensible now, all things considered.

There are other things that are slightly broken but will be fixed by
the first patch. But I'm really just worried about these two cases in
Postgres 13.

Thanks for weighing in
--
Peter Geoghegan

Attachment Content-Type Size
0001-VACUUM-ANALYZE-Always-set-pg_class.reltuples.patch application/x-patch 5.0 KB
0002-Remove-vacuum_cleanup_index_scale_factor-GUC-param.patch application/x-patch 27.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2021-03-08 22:37:16 Re: PATCH: Batch/pipelining support for libpq
Previous Message Jacob Champion 2021-03-08 22:16:23 Re: Proposal: Save user's original authenticated identity for logging