Re: Teaching users how they can get the most out of HOT in Postgres 14

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Teaching users how they can get the most out of HOT in Postgres 14
Date: 2021-06-20 16:55:36
Message-ID: CAH2-WzkFLZewAyiUX59NN+azw2Rf17o07t6aut9w08v_NB246g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 20, 2021 at 9:22 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> I'd want to see some evidence that the GUC is necessary. (For that matter, why is a per relation setting necessary?) Is there a reproducible pathological case, perhaps with a pgbench script, to demonstrate the need? I'm not asking whether there might be some regression, but rather whether somebody wants to construct a worst-case pathological case and publish quantitative results about how bad it is.

One clear argument in favor of the VACUUM option (not so much the
reloption) is that it enables certain testing scenarios.

For example, I was recently using pg_visibility to do a low-level
analysis of how visibility map bits were getting set with a test case
that built on the BenchmarkSQL fair-use TPC-C implementation. The
optimization was something that I noticed in certain scenarios -- I
could have used the option of disabling it at the VACUUM command level
just to get a perfectly clean slate. A small fraction of the pages in
the table to not be set all-visible, which would be inconsequential to
users but was annoying in the context of this particular test
scenario.

The way the optimization works will only ever leave an affected table
in a state where the LP_DEAD items left behind would be highly
unlikely to be counted by ANALYZE. They would not be counted
accurately anyway, either because they're extremely few in number or
because there are relatively many that are concentrated in just a few
heap blocks -- that's how block-based sampling by ANALYZE works.

In short, even if there really was a performance problem implied by
the bypass indexes optimization, it seems unlikely that autovacuum
would run in the first place to take care of it, with or without the
optimization. Even if autovacuum_vacuum_scale_factor were set very
aggressively. VACUUM (really autovacuum) just doesn't tend to work at
that level of precision.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-06-20 17:06:47 Re: Speed up transaction completion faster after many relations are accessed in a transaction
Previous Message Mark Dilger 2021-06-20 16:22:06 Re: Teaching users how they can get the most out of HOT in Postgres 14