Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay

From: Scott Mead <scott(at)meads(dot)us>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Mead, Scott" <meads(at)amazon(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay
Date: 2021-10-26 15:23:41
Message-ID: CAJsHxiCtvWYAsK3E7FGw_Lpdbh7eF3St2uptsPP-Uz5VJgjjug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Wed, May 26, 2021 at 4:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

> On Wed, Apr 14, 2021 at 11:17 PM Mead, Scott <meads(at)amazon(dot)com> wrote:
> >
> >
> >
> > > On Mar 1, 2021, at 8:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
> > >
> > > CAUTION: This email originated from outside of the organization. Do
> not click links or open attachments unless you can confirm the sender and
> know the content is safe.
> > >
> > >
> > >
> > > On Mon, Feb 8, 2021 at 11:49 PM Mead, Scott <meads(at)amazon(dot)com> wrote:
> > >>
> > >> Hello,
> > >> I recently looked at what it would take to make a running
> autovacuum pick-up a change to either cost_delay or cost_limit. Users
> frequently will have a conservative value set, and then wish to change it
> when autovacuum initiates a freeze on a relation. Most users end up
> finding out they are in ‘to prevent wraparound’ after it has happened, this
> means that if they want the vacuum to take advantage of more I/O, they need
> to stop and then restart the currently running vacuum (after reloading the
> GUCs).
> > >>
> > >> Initially, my goal was to determine feasibility for making this
> dynamic. I added debug code to vacuum.c:vacuum_delay_point(void) and found
> that changes to cost_delay and cost_limit are already processed by a
> running vacuum. There was a bug preventing the cost_delay or cost_limit
> from being configured to allow higher throughput however.
> > >>
> > >> I believe this is a bug because currently, autovacuum will
> dynamically detect and increase the cost_limit or cost_delay, but it can
> never decrease those values beyond their setting when the vacuum began.
> The current behavior is for vacuum to limit the maximum throughput of
> currently running vacuum processes to the cost_limit that was set when the
> vacuum process began.
> > >
> > > Thanks for your report.
> > >
> > > I've not looked at the patch yet but I agree that the calculation for
> > > autovacuum cost delay seems not to work fine if vacuum-delay-related
> > > parameters (e.g., autovacuum_vacuum_cost_delay etc) are changed during
> > > vacuuming a table to speed up running autovacuums. Here is my
> > > analysis:
> >
> >
> > I appreciate your in-depth analysis and will comment in-line. That
> said, I still think it’s important that the attached path is applied. As
> it is today, a simple few lines of code prevent users from being able to
> increase the throughput on vacuums that are running without having to
> cancel them first.
> >
> > The patch that I’ve provided allows users to decrease their
> vacuum_cost_delay and get an immediate boost in performance to their
> running vacuum jobs.
> >
> >
> > >
> > > Suppose we have the following parameters and 3 autovacuum workers are
> > > running on different tables:
> > >
> > > autovacuum_vacuum_cost_delay = 100
> > > autovacuum_vacuum_cost_limit = 100
> > >
> > > Vacuum cost-based delay parameters for each workers are follows:
> > >
> > > worker->wi_cost_limit_base = 100
> > > worker->wi_cost_limit = 66
> > > worker->wi_cost_delay = 100
>
> Sorry, worker->wi_cost_limit should be 33.
>
> > >
> > > Each running autovacuum has "wi_cost_limit = 66" because the total
> > > limit (100) is equally rationed. And another point is that the total
> > > wi_cost_limit (198 = 66*3) is less than autovacuum_vacuum_cost_limit,
> > > 100. Which are fine.
>
> So the total wi_cost_limit, 99, is less than autovacuum_vacuum_cost_limit,
> 100.
>
> > >
> > > Here let's change autovacuum_vacuum_cost_delay/limit value to speed up
> > > running autovacuums.
> > >
> > > Case 1 : increasing autovacuum_vacuum_cost_limit to 1000.
> > >
> > > After reloading the configuration file, vacuum cost-based delay
> > > parameters for each worker become as follows:
> > >
> > > worker->wi_cost_limit_base = 100
> > > worker->wi_cost_limit = 100
> > > worker->wi_cost_delay = 100
> > >
> > > If we rationed autovacuum_vacuum_cost_limit, 1000, to 3 workers, it
> > > would be 333. But since we cap it by wi_cost_limit_base, the
> > > wi_cost_limit is 100. I think this is what Mead reported here.
> >
> >
> > Yes, this is exactly correct. The cost_limit is capped at the
> cost_limit that was set during the start of a running vacuum. My patch
> changes this cap to be the max allowed cost_limit (10,000).
>
> The comment of worker's limit calculation says:
>
> /*
> * We put a lower bound of 1 on the cost_limit, to avoid division-
> * by-zero in the vacuum code. Also, in case of roundoff trouble
> * in these calculations, let's be sure we don't ever set
> * cost_limit to more than the base value.
> */
> worker->wi_cost_limit = Max(Min(limit,
> worker->wi_cost_limit_base),
> 1);
>
> If we use the max cost_limit as the upper bound here, the worker's
> limit could unnecessarily be higher than the base value in case of
> roundoff trouble? I think that the problem here is rather that we
> don't update wi_cost_limit_base and wi_cost_delay when rebalancing the
> cost.
>

Currently, vacuum always limits you to the cost_limit_base from the time
that your vacuum started. I'm not sure why, I don't believe it's rounding
related because the rest of the rebalancing code works properly. ISTM that
looking simply allowing the updated cost_limit is a simple solution since
the rebalance code will automatically take it into account.

>
> Regards,
>
> --
> Masahiko Sawada
> EDB: https://www.enterprisedb.com/
>
>
>

--
--
Scott Mead
*scott(at)meads(dot)us <scott(at)meads(dot)us>*

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Borisov 2021-10-26 19:15:04 Re: BUG #17246: Feature request for adoptive indexes
Previous Message Tom Lane 2021-10-26 14:29:39 Re: conchuela timeouts since 2021-10-09 system upgrade

Browse pgsql-hackers by date

  From Date Subject
Next Message Sasasu 2021-10-26 15:47:10 Re: XTS cipher mode for cluster file encryption
Previous Message Mark Dilger 2021-10-26 15:12:47 Re: CREATEROLE and role ownership hierarchies