Re: Should vacuum process config file reload more often

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, amit(dot)kapila16(at)gmail(dot)com
Subject: Re: Should vacuum process config file reload more often
Date: 2023-04-03 02:27:42
Message-ID: CAD21AoDi4i4tj1Cf9BG0q2iyVhHisQ2xxVBZajEqExX6t6_b3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 1, 2023 at 4:09 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> On Fri, Mar 31, 2023 at 10:31 AM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 30, 2023 at 3:26 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > >
> > > > On 30 Mar 2023, at 04:57, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > As another idea, why don't we use macros for that? For example,
> > > > suppose VacuumCostStatus is like:
> > > >
> > > > typedef enum VacuumCostStatus
> > > > {
> > > > VACUUM_COST_INACTIVE_LOCKED = 0,
> > > > VACUUM_COST_INACTIVE,
> > > > VACUUM_COST_ACTIVE,
> > > > } VacuumCostStatus;
> > > > VacuumCostStatus VacuumCost;
> > > >
> > > > non-vacuum code can use the following macros:
> > > >
> > > > #define VacuumCostActive() (VacuumCost == VACUUM_COST_ACTIVE)
> > > > #define VacuumCostInactive() (VacuumCost <= VACUUM_COST_INACTIVE) //
> > > > or we can use !VacuumCostActive() instead.
> > >
> > > I'm in favor of something along these lines. A variable with a name that
> > > implies a boolean value (active/inactive) but actually contains a tri-value is
> > > easily misunderstood. A VacuumCostState tri-value variable (or a better name)
> > > with a set of convenient macros for extracting the boolean active/inactive that
> > > most of the code needs to be concerned with would more for more readable code I
> > > think.
> >
> > The macros are very error-prone. I was just implementing this idea and
> > mistakenly tried to set the macro instead of the variable in multiple
> > places. Avoiding this involves another set of macros, and, in the end, I
> > think the complexity is much worse. Given the reviewers' uniform dislike
> > of VacuumCostInactive, I favor going back to two variables
> > (VacuumCostActive + VacuumFailsafeActive) and moving
> > LVRelState->failsafe_active to the global VacuumFailsafeActive.
> >
> > I will reimplement this in the next version.

Thank you for updating the patches. Here are comments for 0001, 0002,
and 0003 patches:

0001:

@@ -391,7 +389,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
Assert(params->index_cleanup != VACOPTVALUE_UNSPECIFIED);
Assert(params->truncate != VACOPTVALUE_UNSPECIFIED &&
params->truncate != VACOPTVALUE_AUTO);
- vacrel->failsafe_active = false;
+ VacuumFailsafeActive = false;

If we go with the idea of using VacuumCostActive +
VacuumFailsafeActive, we need to make sure that both are cleared at
the end of the vacuum per table. Since the patch clears it only here,
it remains true even after vacuum() if we trigger the failsafe mode
for the last table in the table list.

In addition to that, to ensure that also in an error case, I think we
need to clear it also in PG_FINALLY() block in vacuum().

---
@@ -306,6 +306,7 @@ extern PGDLLIMPORT pg_atomic_uint32
*VacuumSharedCostBalance;
extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
extern PGDLLIMPORT int VacuumCostBalanceLocal;

+extern bool VacuumFailsafeActive;

Do we need PGDLLIMPORT for VacuumFailSafeActive?

0002:

@@ -2388,6 +2398,7 @@ vac_max_items_to_alloc_size(int max_items)
return offsetof(VacDeadItems, items) +
sizeof(ItemPointerData) * max_items;
}

+
/*
* vac_tid_reaped() -- is a particular tid deletable?
*

Unnecessary new line. There are some other unnecessary new lines in this patch.

---
@@ -307,6 +309,8 @@ extern PGDLLIMPORT pg_atomic_uint32 *VacuumActiveNWorkers;
extern PGDLLIMPORT int VacuumCostBalanceLocal;

extern bool VacuumFailsafeActive;
+extern int VacuumCostLimit;
+extern double VacuumCostDelay;

and

@@ -266,8 +266,6 @@ extern PGDLLIMPORT int max_parallel_maintenance_workers;
extern PGDLLIMPORT int VacuumCostPageHit;
extern PGDLLIMPORT int VacuumCostPageMiss;
extern PGDLLIMPORT int VacuumCostPageDirty;
-extern PGDLLIMPORT int VacuumCostLimit;
-extern PGDLLIMPORT double VacuumCostDelay;

Do we need PGDLLIMPORT too?

---
@@ -1773,20 +1773,33 @@ FreeWorkerInfo(int code, Datum arg)
}
}

+
/*
- * Update the cost-based delay parameters, so that multiple workers consume
- * each a fraction of the total available I/O.
+ * Update vacuum cost-based delay-related parameters for autovacuum workers and
+ * backends executing VACUUM or ANALYZE using the value of relevant gucs and
+ * global state. This must be called during setup for vacuum and after every
+ * config reload to ensure up-to-date values.
*/
void
-AutoVacuumUpdateDelay(void)
+VacuumUpdateCosts(void)
{

Isn't it better to define VacuumUpdateCosts() in vacuum.c rather than
autovacuum.c as this is now a common code for both vacuum and
autovacuum?

0003:

@@ -501,9 +502,9 @@ vacuum(List *relations, VacuumParams *params,
{
ListCell *cur;

- VacuumUpdateCosts();
in_vacuum = true;
- VacuumCostActive = (VacuumCostDelay > 0);
+ VacuumFailsafeActive = false;
+ VacuumUpdateCosts();

Hmm, if we initialize VacuumFailsafeActive here, should it be included
in 0001 patch?

---
+ if (VacuumCostDelay > 0)
+ VacuumCostActive = true;
+ else
+ {
+ VacuumCostActive = false;
+ VacuumCostBalance = 0;
+ }

I agree to update VacuumCostActive in VacuumUpdateCosts(). But if we
do that I think this change should be included in 0002 patch.

---
+ if (ConfigReloadPending && !analyze_in_outer_xact)
+ {
+ ConfigReloadPending = false;
+ ProcessConfigFile(PGC_SIGHUP);
+ VacuumUpdateCosts();
+ }

Since analyze_in_outer_xact is false by default, we reload the config
file in vacuum_delay_point() by default. We need to note that
vacuum_delay_point() could be called via other paths, for example
gin_cleanup_pending_list() and ambulkdelete() called by
validate_index(). So it seems to me that we should do the opposite; we
have another global variable, say vacuum_can_reload_config, which is
false by default, and is set to true only when vacuum() allows it. In
vacuum_delay_point(), we reload the config file iff
(ConfigReloadPending && vacuum_can_reload_config).

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2023-04-03 03:21:06 Re: running logical replication as the subscription owner
Previous Message Yurii Rashkovskii 2023-04-03 02:26:25 Re: [PATCH] Support % wildcard in extension upgrade filenames