Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay

From: Scott Mead <scott(at)meads(dot)us>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "Mead, Scott" <meads(at)amazon(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay
Date: 2021-11-19 18:36:49
Message-ID: CAJsHxiCdyxXxNBTPV5cbAvXWZ_WQBnsG1obiU-_sBeOf__WL+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Moving to bugs list.

On Tue, Oct 26, 2021 at 11:23 AM Scott Mead <scott(at)meads(dot)us> wrote:

>
>
> On Wed, May 26, 2021 at 4:01 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> wrote:
>
>> On Wed, Apr 14, 2021 at 11:17 PM Mead, Scott <meads(at)amazon(dot)com> wrote:
>> >
>> >
>> >
>> > > On Mar 1, 2021, at 8:43 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>> wrote:
>> > >
>> > > CAUTION: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>> > >
>> > >
>> > >
>> > > On Mon, Feb 8, 2021 at 11:49 PM Mead, Scott <meads(at)amazon(dot)com> wrote:
>> > >>
>> > >> Hello,
>> > >> I recently looked at what it would take to make a running
>> autovacuum pick-up a change to either cost_delay or cost_limit. Users
>> frequently will have a conservative value set, and then wish to change it
>> when autovacuum initiates a freeze on a relation. Most users end up
>> finding out they are in ‘to prevent wraparound’ after it has happened, this
>> means that if they want the vacuum to take advantage of more I/O, they need
>> to stop and then restart the currently running vacuum (after reloading the
>> GUCs).
>> > >>
>> > >> Initially, my goal was to determine feasibility for making this
>> dynamic. I added debug code to vacuum.c:vacuum_delay_point(void) and found
>> that changes to cost_delay and cost_limit are already processed by a
>> running vacuum. There was a bug preventing the cost_delay or cost_limit
>> from being configured to allow higher throughput however.
>> > >>
>> > >> I believe this is a bug because currently, autovacuum will
>> dynamically detect and increase the cost_limit or cost_delay, but it can
>> never decrease those values beyond their setting when the vacuum began.
>> The current behavior is for vacuum to limit the maximum throughput of
>> currently running vacuum processes to the cost_limit that was set when the
>> vacuum process began.
>> > >
>> > > Thanks for your report.
>> > >
>> > > I've not looked at the patch yet but I agree that the calculation for
>> > > autovacuum cost delay seems not to work fine if vacuum-delay-related
>> > > parameters (e.g., autovacuum_vacuum_cost_delay etc) are changed during
>> > > vacuuming a table to speed up running autovacuums. Here is my
>> > > analysis:
>> >
>> >
>> > I appreciate your in-depth analysis and will comment in-line. That
>> said, I still think it’s important that the attached path is applied. As
>> it is today, a simple few lines of code prevent users from being able to
>> increase the throughput on vacuums that are running without having to
>> cancel them first.
>> >
>> > The patch that I’ve provided allows users to decrease their
>> vacuum_cost_delay and get an immediate boost in performance to their
>> running vacuum jobs.
>> >
>> >
>> > >
>> > > Suppose we have the following parameters and 3 autovacuum workers are
>> > > running on different tables:
>> > >
>> > > autovacuum_vacuum_cost_delay = 100
>> > > autovacuum_vacuum_cost_limit = 100
>> > >
>> > > Vacuum cost-based delay parameters for each workers are follows:
>> > >
>> > > worker->wi_cost_limit_base = 100
>> > > worker->wi_cost_limit = 66
>> > > worker->wi_cost_delay = 100
>>
>> Sorry, worker->wi_cost_limit should be 33.
>>
>> > >
>> > > Each running autovacuum has "wi_cost_limit = 66" because the total
>> > > limit (100) is equally rationed. And another point is that the total
>> > > wi_cost_limit (198 = 66*3) is less than autovacuum_vacuum_cost_limit,
>> > > 100. Which are fine.
>>
>> So the total wi_cost_limit, 99, is less than
>> autovacuum_vacuum_cost_limit, 100.
>>
>> > >
>> > > Here let's change autovacuum_vacuum_cost_delay/limit value to speed up
>> > > running autovacuums.
>> > >
>> > > Case 1 : increasing autovacuum_vacuum_cost_limit to 1000.
>> > >
>> > > After reloading the configuration file, vacuum cost-based delay
>> > > parameters for each worker become as follows:
>> > >
>> > > worker->wi_cost_limit_base = 100
>> > > worker->wi_cost_limit = 100
>> > > worker->wi_cost_delay = 100
>> > >
>> > > If we rationed autovacuum_vacuum_cost_limit, 1000, to 3 workers, it
>> > > would be 333. But since we cap it by wi_cost_limit_base, the
>> > > wi_cost_limit is 100. I think this is what Mead reported here.
>> >
>> >
>> > Yes, this is exactly correct. The cost_limit is capped at the
>> cost_limit that was set during the start of a running vacuum. My patch
>> changes this cap to be the max allowed cost_limit (10,000).
>>
>> The comment of worker's limit calculation says:
>>
>> /*
>> * We put a lower bound of 1 on the cost_limit, to avoid division-
>> * by-zero in the vacuum code. Also, in case of roundoff trouble
>> * in these calculations, let's be sure we don't ever set
>> * cost_limit to more than the base value.
>> */
>> worker->wi_cost_limit = Max(Min(limit,
>> worker->wi_cost_limit_base),
>> 1);
>>
>> If we use the max cost_limit as the upper bound here, the worker's
>> limit could unnecessarily be higher than the base value in case of
>> roundoff trouble? I think that the problem here is rather that we
>> don't update wi_cost_limit_base and wi_cost_delay when rebalancing the
>> cost.
>>
>
> Currently, vacuum always limits you to the cost_limit_base from the time
> that your vacuum started. I'm not sure why, I don't believe it's rounding
> related because the rest of the rebalancing code works properly. ISTM that
> looking simply allowing the updated cost_limit is a simple solution since
> the rebalance code will automatically take it into account.
>
>
>
>
>
>>
>> Regards,
>>
>> --
>> Masahiko Sawada
>> EDB: https://www.enterprisedb.com/
>>
>>
>>
>
> --
> --
> Scott Mead
> *scott(at)meads(dot)us <scott(at)meads(dot)us>*
>

--
--
Scott Mead
*scott(at)meads(dot)us <scott(at)meads(dot)us>*

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2021-11-19 18:55:41 Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Previous Message Tom Lane 2021-11-19 16:22:26 Re: conchuela timeouts since 2021-10-09 system upgrade

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-11-19 18:37:05 Re: sequence cache is kept forever
Previous Message Marcos Pegoraro 2021-11-19 17:57:32 Re: update with no changes