Re: Should vacuum process config file reload more often

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-04-05 13:10:55
Message-ID: CBF0083E-5893-4B35-9498-142A835D03DC@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 4 Apr 2023, at 22:04, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:
>
> On Tue, Apr 4, 2023 at 4:27 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

>> Also, I don't think there is any reason why we want to exclude only
>> the autovacuum launcher.
>
> My rationale is that the launcher is the only other process type which
> might reasonably be executing this code besides autovac workers, client
> backends doing VACUUM/ANALYZE, and parallel vacuum workers. Is it
> confusing to have the launcher have VacuumCostLimt and VacuumCostDelay
> set to the guc values for explicit VACUUM and ANALYZE -- even if the
> launcher doesn't use these variables?
>
> I've removed the check, because I do agree with you that it may be
> unnecessarily confusing in the code.

+1

> On Tue, Apr 4, 2023 at 9:36 AM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>>> On 4 Apr 2023, at 00:35, Melanie Plageman <melanieplageman(at)gmail(dot)com> wrote:

>> Thinking more on this I'm leaning towards going with allowing more frequent
>> reloads in autovacuum, and saving the same for VACUUM for more careful study.
>> The general case is probably fine but I'm not convinced that there aren't error
>> cases which can present unpleasant scenarios.
>
> In attached v15, I've dropped support for VACUUM and non-nested ANALYZE.
> It is like a 5 line change and could be added back at any time.

I think thats the best option for now.

>> +extern int VacuumCostLimit;
>> +extern double VacuumCostDelay;
>> ...
>> -extern PGDLLIMPORT int VacuumCostLimit;
>> -extern PGDLLIMPORT double VacuumCostDelay;
>>
>> Same with these, I don't think this is according to our default visibility.
>> Moreover, I'm not sure it's a good idea to perform this rename. This will keep
>> VacuumCostLimit and VacuumCostDelay exported, but change their meaning. Any
>> external code referring to these thinking they are backing the GUCs will still
>> compile, but may be broken in subtle ways. Is there a reason for not keeping
>> the current GUC variables and instead add net new ones?
>
> When VacuumCostLimit was the same variable in the code and for the GUC
> vacuum_cost_limit, everytime we reload the config file, VacuumCostLimit
> is overwritten. Autovacuum workers have to overwrite this value with the
> appropriate one for themselves given the balancing logic and the value
> of autovacuum_vacuum_cost_limit. However, the problem is, because you
> can specify -1 for autovacuum_vacuum_cost_limit to indicate it should
> fall back to vacuum_cost_limit, we have to reference the value of
> VacuumCostLimit when calculating the new autovacuum worker's cost limit
> after a config reload.
>
> But, you have to be sure you *only* do this after a config reload when
> the value of VacuumCostLimit is fresh and unmodified or you risk
> dividing the value of VacuumCostLimit over and over. That means it is
> unsafe to call functions updating the cost limit more than once.
>
> This orchestration wasn't as difficult when we only reloaded the config
> file once every table. We were careful about it and also kept the
> original "base" cost limit around from table_recheck_autovac(). However,
> once we started reloading the config file more often, this no longer
> works.
>
> By separating the variables modified when the gucs are set and the ones
> used the code, we can make sure we always have the original value the
> guc was set to in vacuum_cost_limit and autovacuum_vacuum_cost_limit,
> whenever we need to reference it.
>
> That being said, perhaps we should document what extensions should do?
> Do you think they will want to use the variables backing the gucs or to
> be able to overwrite the variables being used in the code?

I think I wasn't clear in my comment, sorry. I don't have a problem with
introducing a new variable to split the balanced value from the GUC value.
What I don't think we should do is repurpose an exported symbol into doing a
new thing. In the case at hand I think VacuumCostLimit and VacuumCostDelay
should remain the backing variables for the GUCs, with vacuum_cost_limit and
vacuum_cost_delay carrying the balanced values. So the inverse of what is in
the patch now.

The risk of these symbols being used in extensions might be very low but on
principle it seems unwise to alter a symbol and risk subtle breakage.

> Oh, also I've annotated these with PGDLLIMPORT too.
>
>> + * TODO: should VacuumCostLimit and VacuumCostDelay be initialized to valid or
>> + * invalid values?
>> + */
>> +int VacuumCostLimit = 0;
>> +double VacuumCostDelay = -1;
>>
>> I think the important part is to make sure they are never accessed without
>> VacuumUpdateCosts having been called first. I think that's the case here, but
>> it's not entirely clear. Do you see a codepath where that could happen? If
>> they are initialized to a sentinel value we also need to check for that, so
>> initializing to the defaults from the corresponding GUCs seems better.
>
> I don't see a case where autovacuum could access these without calling
> VacuumUpdateCosts() first. I think the other callers of
> vacuum_delay_point() are the issue (gist/gin/hash/etc).
>
> It might need a bit more thought.
>
> My concern was that these variables correspond to multiple GUCs each
> depending on the backend type, and those backends have different
> defaults (e.g. autovac workers default cost delay is different than
> client backend doing vacuum cost delay).
>
> However, what I have done in this version is initialize them to the
> defaults for a client backend executing VACUUM or ANALYZE, since I am
> fairly confident that autovacuum will not use them without calling
> VacuumUpdateCosts().

Another question along these lines, we only call AutoVacuumUpdateLimit() in
case there is a sleep in vacuum_delay_point():

+ /*
+ * Balance and update limit values for autovacuum workers. We must
+ * always do this in case the autovacuum launcher or another
+ * autovacuum worker has recalculated the number of workers across
+ * which we must balance the limit. This is done by the launcher when
+ * launching a new worker and by workers before vacuuming each table.
+ */
+ AutoVacuumUpdateLimit();

Shouldn't we always call that in case we had a config reload, or am I being
thick?

>> +static double av_relopt_cost_delay = -1;
>> +static int av_relopt_cost_limit = 0;

Sorry, I didn't catch this earlier, shouldn't this be -1 to match the default
value of autovacuum_vacuum_cost_limit?

>> These need a comment IMO, ideally one that explain why they are initialized to
>> those values.
>
> I've added a comment.

+ * Variables to save the cost-related table options for the current relation

The "table options" nomenclature is right now only used for FDW foreign table
options, I think we should use "storage parameters" or "relation options" here.

>> + /* There is at least 1 autovac worker (this worker). */
>> + Assert(nworkers_for_balance > 0);
>>
>> Is there a scenario where this is expected to fail? If so I think this should
>> be handled and not just an Assert.
>
> No, this isn't expected to happen because an autovacuum worker would
> have called autovac_recalculate_workers_for_balance() before calling
> VacuumUpdateCosts() (which calls AutoVacuumUpdateLimit()) in
> do_autovacuum(). But, if someone were to move around or add a call to
> VacuumUpdateCosts() there is a chance it could happen.

Thinking more on this I'm tempted to recommend that we promote this to an
elog(), mainly due to the latter. An accidental call to VacuumUpdateCosts()
doesn't seem entirely unlikely to happen

--
Daniel Gustafsson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-04-05 13:15:25 Re: Minimal logical decoding on standbys
Previous Message Benoit Lobréau 2023-04-05 13:00:53 Parallel Query Stats