Re: Should vacuum process config file reload more often

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, amit(dot)kapila16(at)gmail(dot)com
Subject: Re: Should vacuum process config file reload more often
Date: 2023-04-04 08:26:58
Message-ID: CAD21AoC5aDwARiqsL+KwHqnN7phub9AaMkbGkJ9aUCeETx8esw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 4, 2023 at 1:41 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Sun, Apr 2, 2023 at 10:28 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > Thank you for updating the patches. Here are comments for 0001, 0002,
> > and 0003 patches:
>
> Thanks for the review!
>
> v13 attached with requested updates.
>
> > 0001:
> >
> > @@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
> > Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
> > Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
> > params->truncate != VACOPTVALUE_AUTO);
> > - vacrel->failsafe_active = false;
> > + VacuumFailsafeActive = false;
> >
> > If we go with the idea of using VacuumCostActive +
> > VacuumFailsafeActive, we need to make sure that both are cleared at
> > the end of the vacuum per table. Since the patch clears it only here,
> > it remains true even after vacuum() if we trigger the failsafe mode
> > for the last table in the table list.
> >
> > In addition to that, to ensure that also in an error case, I think we
> > need to clear it also in PG_FINALLY() block in vacuum().
>
> So, in 0001, I tried to keep it exactly the same as
> LVRelState->failsafe_active except for it being a global. We don't
> actually use VacuumFailsafeActive in this commit except in vacuumlazy.c,
> which does its own management of the value (it resets it to false at the
> top of heap_vacuum_rel()).
>
> In the later commit which references VacuumFailsafeActive outside of
> vacuumlazy.c, I had reset it in PG_FINALLY(). I hadn't reset it in the
> relation list loop in vacuum(). Autovacuum calls vacuum() for each
> relation. However, you are right that for VACUUM with a list of
> relations for a table access method other than heap, once set to true,
> if the table AM forgets to reset the value to false at the end of
> vacuuming the relation, it would stay true.
>
> I've set it to false now at the bottom of the loop through relations in
> vacuum().

Agreed. Probably we can merge 0001 into 0003 but I leave it to
committers. The 0001 patch mostly looks good to me except for one
point:

@@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
params->truncate != VACOPTVALUE_AUTO);
- vacrel->failsafe_active = false;
+ VacuumFailsafeActive = false;
vacrel->consider_bypass_optimization = true;
vacrel->do_index_vacuuming = true;

Looking at the 0003 patch, we set VacuumFailsafeActive to false per table:

+ /*
+ * Ensure VacuumFailsafeActive has been reset
before vacuuming the
+ * next relation relation.
+ */
+ VacuumFailsafeActive = false;

Given that we ensure it's reset before vacuuming the next table, do we
need to reset it in heap_vacuum_rel?

(there is a typo; s/relation relation/relation/)

>
> > ---
> > @@ -306,6 +306,7 @@ extern PGDLLIMPORT pg_atomic_uint32
> > *VacuumSharedCostBalance;
> > extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
> > extern PGDLLIMPORT int VacuumCostBalanceLocal;
> >
> > +extern bool VacuumFailsafeActive;
> >
> > Do we need PGDLLIMPORT for VacuumFailSafeActive?
>
> I didn't add one because I thought extensions and other code probably
> shouldn't access this variable. I thought PGDLLIMPORT was only needed
> for extensions built on windows to access variables.

Agreed.

>
> > 0002:
> >
> > @@ -2388,6 +2398,7 @@ vac_max_items_to_alloc_size(int max_items)
> > return offsetof(VacDeadItems, items) +
> > sizeof(ItemPointerData) * max_items;
> > }
> >
> > +
> > /*
> > * vac_tid_reaped() -- is a particular tid deletable?
> > *
> >
> > Unnecessary new line. There are some other unnecessary new lines in this patch.
>
> Thanks! I think I got them all.
>
> > ---
> > @@ -307,6 +309,8 @@ extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
> > extern PGDLLIMPORT int VacuumCostBalanceLocal;
> >
> > extern bool VacuumFailsafeActive;
> > +extern int VacuumCostLimit;
> > +extern double VacuumCostDelay;
> >
> > and
> >
> > @@ -266,8 +266,6 @@ extern PGDLLIMPORT int max_parallel_maintenance_workers;
> > extern PGDLLIMPORT int VacuumCostPageHit;
> > extern PGDLLIMPORT int VacuumCostPageMiss;
> > extern PGDLLIMPORT int VacuumCostPageDirty;
> > -extern PGDLLIMPORT int VacuumCostLimit;
> > -extern PGDLLIMPORT double VacuumCostDelay;
> >
> > Do we need PGDLLIMPORT too?
>
> I was on the fence about this. I annotated the new guc variables
> vacuum_cost_delay and vacuum_cost_limit with PGDLLIMPORT, but I did not
> annotate the variables used in vacuum code (VacuumCostLimit/Delay). I
> think whether or not this is the right choice depends on two things:
> whether or not my understanding of PGDLLIMPORT is correct and, if it is,
> whether or not we want extensions to be able to access
> VacuumCostLimit/Delay or if just access to the guc variables is
> sufficient/desirable.

I guess it would be better to keep both accessible for backward
compatibility. Extensions are able to access both GUC values and
values that are actually used for vacuum delays (as we used to use the
same variables).

>
> > ---
> > @@ -1773,20 +1773,33 @@ FreeWorkerInfo(int code, Datum arg)
> > }
> > }
> >
> > +
> > /*
> > - * Update the cost-based delay parameters, so that multiple workers consume
> > - * each a fraction of the total available I/O.
> > + * Update vacuum cost-based delay-related parameters for autovacuum workers and
> > + * backends executing VACUUM or ANALYZE using the value of relevant gucs and
> > + * global state. This must be called during setup for vacuum and after every
> > + * config reload to ensure up-to-date values.
> > */
> > void
> > -AutoVacuumUpdateDelay(void)
> > +VacuumUpdateCosts(void
> >
> > Isn't it better to define VacuumUpdateCosts() in vacuum.c rather than
> > autovacuum.c as this is now a common code for both vacuum and
> > autovacuum?
>
> We can't access members of WorkerInfoData from inside vacuum.c

Oops, you're right.

>
> > 0003:
> >
> > @@ -501,9 +502,9 @@ vacuum(List *relations, VacuumParams *params,
> > {
> > ListCell *cur;
> >
> > - VacuumUpdateCosts();
> > in_vacuum = true;
> > - VacuumCostActive = (VacuumCostDelay > 0);
> > + VacuumFailsafeActive = false;
> > + VacuumUpdateCosts();
> >
> > Hmm, if we initialize VacuumFailsafeActive here, should it be included
> > in 0001 patch?
>
> See comment above. This is the first patch where we use or reference it
> outside of vacuumlazy.c
>
> > ---
> > + if (VacuumCostDelay > 0)
> > + VacuumCostActive = true;
> > + else
> > + {
> > + VacuumCostActive = false;
> > + VacuumCostBalance = 0;
> > + }
> >
> > I agree to update VacuumCostActive in VacuumUpdateCosts(). But if we
> > do that I think this change should be included in 0002 patch.
>
> I'm a bit hesitant to do this because in 0002 VacuumCostActive cannot
> change status while vacuuming a table or even between tables for VACUUM
> when a list of relations is specified (except for being disabled by
> failsafe mode) Adding it to VacuumUpdateCosts() in 0003 makes it clear
> that it could change while vacuuming a table, so we must update it.
>

Agreed.

> I previously had 0002 introduce AutoVacuumUpdateLimit(), which only
> updated VacuumCostLimit with wi_cost_limit for autovacuum workers and
> then called that in vacuum_delay_point() (instead of
> AutoVacuumUpdateDelay() or VacuumUpdateCosts()). I abandoned that idea
> in favor of the simplicity of having VacuumUpdateCosts() just update
> those variables for everyone, since it could be reused in 0003.
>
> Now, I'm thinking the previous method might be more clear?
> Or is what I have okay?

I'm fine with the current one.

>
> > ---
> > + if (ConfigReloadPending && !analyze_in_outer_xact)
> > + {
> > + ConfigReloadPending = false;
> > + ProcessConfigFile(PGC_SIGHUP);
> > + VacuumUpdateCosts();
> > + }
> >
> > Since analyze_in_outer_xact is false by default, we reload the config
> > file in vacuum_delay_point() by default. We need to note that
> > vacuum_delay_point() could be called via other paths, for example
> > gin_cleanup_pending_list() and ambulkdelete() called by
> > validate_index(). So it seems to me that we should do the opposite; we
> > have another global variable, say vacuum_can_reload_config, which is
> > false by default, and is set to true only when vacuum() allows it. In
> > vacuum_delay_point(), we reload the config file iff
> > (ConfigReloadPending && vacuum_can_reload_config).
>
> Wow, great point. Thanks for catching this. I've made the update you
> suggested. I also set vacuum_can_reload_config to false in PG_FINALLY()
> in vacuum().

Here are some review comments for 0002-0004 patches:

0002:
- if (MyWorkerInfo)
+ if (am_autovacuum_launcher)
+ return;
+
+ if (am_autovacuum_worker)
{
- VacuumCostDelay = MyWorkerInfo->wi_cost_delay;
VacuumCostLimit = MyWorkerInfo->wi_cost_limit;
+ VacuumCostDelay = MyWorkerInfo->wi_cost_delay;
+ }

Isn't it a bit safer to check MyWorkerInfo instead of
am_autovacuum_worker? Also, I don't think there is any reason why we
want to exclude only the autovacuum launcher.

---
+ * TODO: should VacuumCostLimit and VacuumCostDelay be initialized to valid or
+ * invalid values?

How about using the default value of normal backends, 200 and 0?

0003:

@@ -83,6 +84,7 @@ int vacuum_cost_limit;
*/
int VacuumCostLimit = 0;
double VacuumCostDelay = -1;
+static bool vacuum_can_reload_config = false;

In vacuum.c, we use snake case for GUC parameters and camel case for
other global variables, so it seems better to rename it
VacuumCanReloadConfig. Sorry, that's my fault.

0004:

+ if (tab->at_dobalance)
+ pg_atomic_test_set_flag(&MyWorkerInfo->wi_dobalance);
+ else

The comment of pg_atomic_test_set_flag() says that it returns false if
the flag has not successfully been set:

* pg_atomic_test_set_flag - TAS()
*
* Returns true if the flag has successfully been set, false otherwise.
*
* Acquire (including read barrier) semantics.

But IIUC we don't need to worry about that as only one process updates
the flag, right? It might be a good idea to add some comments why we
don't need to check the return value.

---
- if (worker->wi_proc != NULL)
- elog(DEBUG2, "autovac_balance_cost(pid=%d
db=%u, rel=%u, dobalance=%s cost_limit=%d, cost_limit_base=%d,
cost_delay=%g)",
- worker->wi_proc->pid,
worker->wi_dboid, worker->wi_tableoid,
- worker->wi_dobalance ? "yes" : "no",
- worker->wi_cost_limit,
worker->wi_cost_limit_base,
- worker->wi_cost_delay);

I think it's better to keep this kind of log in some form for
debugging. For example, we can show these values of autovacuum workers
in VacuumUpdateCosts().

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2023-04-04 09:42:45 Re: Minimal logical decoding on standbys
Previous Message John Naylor 2023-04-04 08:12:10 Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound