Re: Teaching users how they can get the most out of HOT in Postgres 14

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Teaching users how they can get the most out of HOT in Postgres 14
Date: 2021-05-18 14:28:43
Message-ID: CAD21AoAWxEA6D3OVMLKat_dgjjdir5=vSSS6gb0aXcUs-xW_nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(I had missed this discussion due to the mismatched thread subject..)

On Fri, May 14, 2021 at 11:14 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Thu, May 13, 2021 at 01:27:44PM -0700, Peter Geoghegan wrote:
> > Almost all of the benefit of the optimization is available with the
> > current BYPASS_THRESHOLD_PAGES threshold (2% of heap pages have
> > LP_DEAD items), which has less risk than a higher threshold. I don't
> > think it matters much if we have the occasional unnecessary round of
> > index vacuuming on account of not applying the optimization. The truly
> > important benefit of the optimization is to not do unnecessary index
> > vacuuming all the time in the extreme cases where it's really hard to
> > justify. There is currently zero evidence that anything higher than 2%
> > will ever help anybody to an appreciably degree. (There is also no
> > evidence that the optimization will ever need to be disabled, but I
> > accept the need to be conservative and offer an off switch -- the
> > precautionary principle applies when talking about new harms.)
> >
> > Not having to scan every index on every VACUUM, but only every 5th or
> > so VACUUM is a huge improvement. But going from every 5th VACUUM to
> > every 10th VACUUM? That's at best a tiny additional improvement in
> > exchange for what I'd guess is a roughly linear increase in risk
> > (maybe a greater-than-linear increase, even). That's an awful deal.
>
> Perhaps that's an awful deal, but based on which facts can you really
> say that this new behavior of needing at least 2% of relation pages
> with some dead items to clean up indexes is not a worse deal in some
> cases? This may cause more problems for the in-core index AMs, as
> much as it could impact any out-of-core index AM, no? What about
> other values like 1%, or even 5%? My guess is that there would be an
> ask to have more control on that, though that stands as my opinion.

I'm concerned how users can tune that scale type parameter that can be
configurable between 0.0 and 0.05. I think that users basically don't
pay attention to how many blocks are updated by UPDATE/DELETE. Unlike
old vacuum_cleanup_index_scale_factor, increasing this parameter would
directly affect index bloats. If the user can accept more index bloat
to speed up (auto)vacuum, they can use vacuum_index_cleanup instead.

I prefer to have an on/off switch just in case. I remember I also
commented the same thing before. We’ve discussed a way to control
whether or not to enable the skipping optimization by adding a new
mode to INDEX_CLEANUP option, as Peter mentioned. For example, we can
use the new mode “auto” (or “smart”) mode by default, enabling all
skipping optimizations, and specifying “on” disables them. Or we can
add “force” mode to disable all skipping optimizations while leaving
the existing modes as they are. Anyway, I think it’s not a good idea
to add a new GUC parameter that we’re not sure how to tune.

IIUC skipping index vacuum when less than 2% of relation pages with at
least one LP_DEAD is a table’s optimization rather than a btree
index’s optimization. Since we’re not likely to set many pages
all-visible or collect many dead tuples in that case, we can skip
index vacuuming and table vacuuming. IIUC this case, fortunately, goes
well together btree indexes’ bottom-up deletion. If this skipping
behavior badly affects other indexes AMs, this optimization should be
considered within btree indexes, although we will need a way for index
AMs to consider and tell the vacuum strategy. But I guess this works
well in most cases so having an on/off switch suffice.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-05-18 14:34:03 Re: PG 14 release notes, first draft
Previous Message Bharath Rupireddy 2021-05-18 14:16:39 Re: postgres_fdw - should we tighten up batch_size, fetch_size options against non-numeric values?