Re: Should vacuum process config file reload more often

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Should vacuum process config file reload more often
Date: 2023-03-15 05:13:24
Message-ID: CAD21AoC=BRMwUpVDf1tzDa13--dLhuq=c2XQ_KAYA4Qx3cS=kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 11, 2023 at 8:11 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> Quotes below are combined from two of Sawada-san's emails.
>
> I've also attached a patch with my suggested current version.
>
> On Thu, Mar 9, 2023 at 10:27 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Mar 10, 2023 at 11:23 AM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> > >
> > > On Tue, Mar 7, 2023 at 12:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Mon, Mar 6, 2023 at 5:26 AM Melanie Plageman
> > > > <melanieplageman(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Mar 2, 2023 at 6:37 PM Melanie Plageman
> > > > > In this version I've removed wi_cost_delay from WorkerInfoData. There is
> > > > > no synchronization of cost_delay amongst workers, so there is no reason
> > > > > to keep it in shared memory.
> > > > >
> > > > > One consequence of not updating VacuumCostDelay from wi_cost_delay is
> > > > > that we have to have a way to keep track of whether or not autovacuum
> > > > > table options are in use.
> > > > >
> > > > > This patch does this in a cringeworthy way. I added two global
> > > > > variables, one to track whether or not cost delay table options are in
> > > > > use and the other to store the value of the table option cost delay. I
> > > > > didn't want to use a single variable with a special value to indicate
> > > > > that table option cost delay is in use because
> > > > > autovacuum_vacuum_cost_delay already has special values that mean
> > > > > certain things. My code needs a better solution.
> > > >
> > > > While it's true that wi_cost_delay doesn't need to be shared, it seems
> > > > to make the logic somewhat complex. We need to handle cost_delay in a
> > > > different way from other vacuum-related parameters and we need to make
> > > > sure av[_use]_table_option_cost_delay are set properly. Removing
> > > > wi_cost_delay from WorkerInfoData saves 8 bytes shared memory per
> > > > autovacuum worker but it might be worth considering to keep
> > > > wi_cost_delay for simplicity.
> > >
> > > Ah, it turns out we can't really remove wi_cost_delay from WorkerInfo
> > > anyway because the launcher doesn't know anything about table options
> > > and so the workers have to keep an updated wi_cost_delay that the
> > > launcher or other autovac workers who are not vacuuming that table can
> > > read from when calculating the new limit in autovac_balance_cost().
> >
> > IIUC if any of the cost delay parameters has been set individually,
> > the autovacuum worker is excluded from the balance algorithm.
>
> Ah, yes! That's right. So it is not a problem. Then I still think
> removing wi_cost_delay from the worker info makes sense. wi_cost_delay
> is a double and can't easily be accessed atomically the way
> wi_cost_limit can be.
>
> Keeping the cost delay local to the backends also makes it clear that
> cost delay is not something that should be written to by other backends
> or that can differ from worker to worker. Without table options in the
> picture, the cost delay should be the same for any worker who has
> reloaded the config file.

Agreed.

>
> As for the cost limit safe access issue, maybe we can avoid a LWLock
> acquisition for reading wi_cost_limit by using an atomic similar to what
> you suggested here for "did_rebalance".
>
> > > I've added in a shared lock for reading from wi_cost_limit in this
> > > patch. However, AutoVacuumUpdateLimit() is called unconditionally in
> > > vacuum_delay_point(), which is called quite often (per block-ish), so I
> > > was trying to think if there is a way we could avoid having to check
> > > this shared memory variable on every call to vacuum_delay_point().
> > > Rebalances shouldn't happen very often (done by the launcher when a new
> > > worker is launched and by workers between vacuuming tables). Maybe we
> > > can read from it less frequently?
> >
> > Yeah, acquiring the lwlock for every call to vacuum_delay_point()
> > seems to be harmful. One idea would be to have one sig_atomic_t
> > variable in WorkerInfoData and autovac_balance_cost() set it to true
> > after rebalancing the worker's cost-limit. The worker can check it
> > without locking and update its delay parameters if the flag is true.
>
> Instead of having the atomic indicate whether or not someone (launcher
> or another worker) did a rebalance, it would simply store the current
> cost limit. Then the worker can normally access it with a simple read.
>
> My rationale is that if we used an atomic to indicate whether or not we
> did a rebalance ("did_rebalance"), we would have the same cache
> coherency guarantees as if we just used the atomic for the cost limit.
> If we read from the "did_rebalance" variable and missed someone having
> written to it on another core, we still wouldn't get around to checking
> the wi_cost_limit variable in shared memory, so it doesn't matter that
> we bothered to keep it in shared memory and use a lock to access it.
>
> I noticed we don't allow wi_cost_limit to ever be less than 0, so we
> could store wi_cost_limit in an atomic uint32.
>
> I'm not sure if it is okay to do pg_atomic_read_u32() and
> pg_atomic_unlocked_write_u32() or if we need pg_atomic_write_u32() in
> most cases.

I agree to use pg_atomic_uin32. Given that the comment of
pg_atomic_unlocked_write_u32() says:

* pg_atomic_compare_exchange_u32. This should only be used in cases where
* minor performance regressions due to atomics emulation are unacceptable.

I think pg_atomic_write_u32() is enough for our use case.

>
> I've implemented the atomic cost limit in the attached patch. Though,
> I'm pretty unsure about how I initialized the atomics in
> AutoVacuumShmemInit()...

+
/* initialize the WorkerInfo free list */
for (i = 0; i < autovacuum_max_workers; i++)
dlist_push_head(&AutoVacuumShmem->av_freeWorkers,
&worker[i].wi_links);
+
+ dlist_foreach(iter, &AutoVacuumShmem->av_freeWorkers)
+ pg_atomic_init_u32(
+
&(dlist_container(WorkerInfoData, wi_links, iter.cur))->wi_cost_limit,
+ 0);
+

I think we can do like:

/* initialize the WorkerInfo free list */
for (i = 0; i < autovacuum_max_workers; i++)
{
dlist_push_head(&AutoVacuumShmem->av_freeWorkers,
&worker[i].wi_links);
pg_atomic_init_u32(&(worker[i].wi_cost_limit));
}

>
> If the consensus is that it is simply too confusing to take
> wi_cost_delay out of WorkerInfo, we might be able to afford using a
> shared lock to access it because we won't call AutoVacuumUpdateDelay()
> on every invocation of vacuum_delay_point() -- only when we've reloaded
> the config file.
>
> One potential option to avoid taking a shared lock on every call to
> AutoVacuumUpdateDelay() is to set a global variable to indicate that we
> did update it (since we are the only ones updating it) and then only
> take the shared LWLock in AutoVacuumUpdateDelay() if that flag is true.
>

If we remove wi_cost_delay from WorkerInfo, probably we don't need to
acquire the lwlock in AutoVacuumUpdateDelay()? The shared field we
access in that function will be only wi_dobalance, but this field is
updated only by its owner autovacuum worker.

> > ---
> > void
> > AutoVacuumUpdateDelay(void)
> > {
> > - if (MyWorkerInfo)
> > + /*
> > + * We are using autovacuum-related GUCs to update
> > VacuumCostDelay, so we
> > + * only want autovacuum workers and autovacuum launcher to do this.
> > + */
> > + if (!(am_autovacuum_worker || am_autovacuum_launcher))
> > + return;
> >
> > Is there any case where the autovacuum launcher calls
> > AutoVacuumUpdateDelay() function?
>
> I had meant to add it to HandleAutoVacLauncherInterrupts() after
> reloading the config file (done in attached patch). When using the
> global variables for cost delay (instead of wi_cost_delay in worker
> info), the autovac launcher also has to do the check in the else branch
> of AutoVacuumUpdateDelay()
>
> VacuumCostDelay = autovacuum_vac_cost_delay >= 0 ?
> autovacuum_vac_cost_delay : VacuumCostDelay;
>
> to make sure VacuumCostDelay is correct for when it calls
> autovac_balance_cost().

But doesn't the launcher do a similar thing at the beginning of
autovac_balance_cost()?

double vac_cost_delay = (autovacuum_vac_cost_delay >= 0 ?
autovacuum_vac_cost_delay : VacuumCostDelay);

Related to this point, I think autovac_balance_cost() should use
globally-set cost_limit and cost_delay values to calculate worker's
vacuum-delay parameters. IOW, vac_cost_limit and vac_cost_delay should
come from the config file setting, not table option etc:

int vac_cost_limit = (autovacuum_vac_cost_limit > 0 ?
autovacuum_vac_cost_limit : VacuumCostLimit);
double vac_cost_delay = (autovacuum_vac_cost_delay >= 0 ?
autovacuum_vac_cost_delay : VacuumCostDelay);

If my understanding is right, the following change is not right;
AutoVacUpdateLimit() updates the VacuumCostLimit based on the value in
MyWorkerInfo:

MyWorkerInfo->wi_cost_limit_base = tab->at_vacuum_cost_limit;
+ AutoVacuumUpdateLimit();

/* do a balance */
autovac_balance_cost();

- /* set the active cost parameters from the result of that */
- AutoVacuumUpdateDelay();

Also, even when using the global variables for cost delay, the
launcher doesn't need to check the global variable. It should always
be able to use either autovacuum_vac_cost_delay/limit or
VacuumCostDelay/Limit.

>
> This also made me think about whether or not we still need cost_limit_base.
> It is used to ensure that autovac_balance_cost() never ends up setting
> workers' wi_cost_limits above the current autovacuum_vacuum_cost_limit
> (or VacuumCostLimit). However, the launcher and all the workers should
> know what the value is without cost_limit_base, no?

Yeah, the current balancing algorithm looks to respect the cost_limit
value set when starting to vacuum the table. The proportion of the
amount of I/O that a worker can consume is calculated based on the
base value and the new worker's cost_limit value cannot exceed the
base value. Given that we're trying to dynamically tune worker's cost
parameters (delay and limit), this concept seems to need to be
updated.

>
> > > Also not sure how the patch interacts with failsafe autovac and parallel
> > > vacuum.
> >
> > Good point.
> >
> > When entering the failsafe mode, we disable the vacuum delays (see
> > lazy_check_wraparound_failsafe()). We need to keep disabling the
> > vacuum delays even after reloading the config file. One idea is to
> > have another global variable indicating we're in the failsafe mode.
> > vacuum_delay_point() doesn't update VacuumCostActive if the flag is
> > true.
>
> I think we might not need to do this. Other than in
> lazy_check_wraparound_failsafe(), VacuumCostActive is only updated in
> two places:
>
> 1) in vacuum() which autovacuum will call per table. And failsafe is
> reset per table as well.
>
> 2) in vacuum_delay_point(), but, since VacuumCostActive will already be
> false when we enter vacuum_delay_point() the next time after
> lazy_check_wraparound_failsafe(), we won't set VacuumCostActive there.

Indeed. But does it mean that there is no code path to turn
vacuum-delay on, even when vacuum_cost_delay is updated from 0 to
non-0?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-03-15 05:33:49 Re: Allow logical replication to copy tables in binary format
Previous Message Thomas Munro 2023-03-15 04:59:35 Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken