Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Mead, Scott" <meads(at)amazon(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] Autovacuum not dynamically decreasing cost_limit and cost_delay
Date: 2021-03-02 01:43:08
Message-ID: CAD21AoAju40dRSvsTx9BtcR1vza3qYogU3=AKWnfrpQ79Q_7mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Feb 8, 2021 at 11:49 PM Mead, Scott <meads(at)amazon(dot)com> wrote:
>
> Hello,
> I recently looked at what it would take to make a running autovacuum pick-up a change to either cost_delay or cost_limit. Users frequently will have a conservative value set, and then wish to change it when autovacuum initiates a freeze on a relation. Most users end up finding out they are in ‘to prevent wraparound’ after it has happened, this means that if they want the vacuum to take advantage of more I/O, they need to stop and then restart the currently running vacuum (after reloading the GUCs).
>
> Initially, my goal was to determine feasibility for making this dynamic. I added debug code to vacuum.c:vacuum_delay_point(void) and found that changes to cost_delay and cost_limit are already processed by a running vacuum. There was a bug preventing the cost_delay or cost_limit from being configured to allow higher throughput however.
>
> I believe this is a bug because currently, autovacuum will dynamically detect and increase the cost_limit or cost_delay, but it can never decrease those values beyond their setting when the vacuum began. The current behavior is for vacuum to limit the maximum throughput of currently running vacuum processes to the cost_limit that was set when the vacuum process began.

Thanks for your report.

I've not looked at the patch yet but I agree that the calculation for
autovacuum cost delay seems not to work fine if vacuum-delay-related
parameters (e.g., autovacuum_vacuum_cost_delay etc) are changed during
vacuuming a table to speed up running autovacuums. Here is my
analysis:

Suppose we have the following parameters and 3 autovacuum workers are
running on different tables:

autovacuum_vacuum_cost_delay = 100
autovacuum_vacuum_cost_limit = 100

Vacuum cost-based delay parameters for each workers are follows:

worker->wi_cost_limit_base = 100
worker->wi_cost_limit = 66
worker->wi_cost_delay = 100

Each running autovacuum has "wi_cost_limit = 66" because the total
limit (100) is equally rationed. And another point is that the total
wi_cost_limit (198 = 66*3) is less than autovacuum_vacuum_cost_limit,
100. Which are fine.

Here let's change autovacuum_vacuum_cost_delay/limit value to speed up
running autovacuums.

Case 1 : increasing autovacuum_vacuum_cost_limit to 1000.

After reloading the configuration file, vacuum cost-based delay
parameters for each worker become as follows:

worker->wi_cost_limit_base = 100
worker->wi_cost_limit = 100
worker->wi_cost_delay = 100

If we rationed autovacuum_vacuum_cost_limit, 1000, to 3 workers, it
would be 333. But since we cap it by wi_cost_limit_base, the
wi_cost_limit is 100. I think this is what Mead reported here.

Case 2 : decreasing autovacuum_vacuum_cost_delay to 10.

After reloading the configuration file, vacuum cost-based delay
parameters for each workers become as follows:

worker->wi_cost_limit_base = 100
worker->wi_cost_limit = 100
worker->wi_cost_delay = 100

Actually, the result is the same as case 1. But In this case, the
total cost among the three workers is 300, which is greater than
autovacuum_vacuum_cost_limit, 100. This behavior violates what the
documentation explains in the description of
autovacuum_vacuum_cost_limit:

---
Note that the value is distributed proportionally among the running
autovacuum workers, if there is more than one, so that the sum of the
limits for each worker does not exceed the value of this variable.
---

It seems to me that those problems come from the fact that we don't
change both wi_cost_limit_base and wi_cost_delay during auto-vacuuming
a table in spite of using autovacuum_vac_cost_limit/delay to calculate
cost_avail. Such a wrong calculation happens until all running
autovacuum workers finish the current vacuums. When a worker starts to
process a new table, it resets both wi_cost_limit_base and
wi_cost_delay.

Looking at autovac_balance_cost(), it considers worker's
wi_cost_limit_base to calculate the total base cost limit of
participating active workers as follows:

cost_total +=
(double) worker->wi_cost_limit_base / worker->wi_cost_delay;

But what is the point of calculating it while assuming each worker
having a different cost limit? Since workers vacuuming on a table
whose cost parameters are set individually doesn't participate in this
calculation (by commit 1021bd6a8 in 2014), having at_dobalance true, I
wonder if we can just assume all workers have the same cost_limit and
cost_delay except for workers setting at_dobalance true. If we can do
that, I guess we no longer need wi_cost_limit_base.

Also, we don't change wi_cost_delay during vacuuming a table, which
seems wrong to me. autovac_balance_cost() can change workers'
wi_cost_delay, eventually applying to VacuumCostDelay.

What do you think?

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2021-03-02 03:19:24 Re: BUG #16905: Dropping and recreating a large table with 5 indexes slowed down query performance
Previous Message David Rowley 2021-03-02 00:07:15 Re: BUG #16908: Postgres (12) allows you (re)-attach partitions that violate Foreign Key constraints?

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-03-02 01:57:13 Re: doc review for v14
Previous Message Vik Fearing 2021-03-02 01:31:06 Re: TRIM_ARRAY