Re: New strategies for freezing, advancing relfrozenxid early

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2023-01-05 01:21:37
Message-ID: CAEze2WjuQ3xpGAd7+VwsmqDU3RjFMcGZP3WRRFSiA9Yn=pxsrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 3 Jan 2023 at 21:30, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> Attached is v14.

Some reviews (untested; only code review so far) on these versions of
the patches:

> [PATCH v14 1/3] Add eager and lazy freezing strategies to VACUUM.

> + /*
> + * Threshold cutoff point (expressed in # of physical heap rel blocks in
> + * rel's main fork) that triggers VACUUM's eager freezing strategy
> + */

I don't think the mention of 'cutoff point' is necessary when it has
'Threshold'.

> + int freeze_strategy_threshold; /* threshold to use eager
> [...]
> + BlockNumber freeze_strategy_threshold;

Is there a way to disable the 'eager' freezing strategy? `int` cannot
hold the maximum BlockNumber...

> + lazy_scan_strategy(vacrel);
> if (verbose)

I'm slightly suprised you didn't update the message for verbose vacuum
to indicate whether we used the eager strategy: there are several GUCs
for tuning this behaviour, so you'd expect to want direct confirmation
that the configuration is effective.
(looks at further patches) I see that the message for verbose vacuum
sees significant changes in patch 2 instead.

---

> [PATCH v14 2/3] Add eager and lazy VM strategies to VACUUM.

General comments:

I don't see anything regarding scan synchronization in the vmsnap scan
system. I understand that VACUUM is a load that is significantly
different from normal SEQSCANs, but are there good reasons to _not_
synchronize the start of VACUUM?

Right now, we don't use syncscan to determine a startpoint. I can't
find the reason why in the available documentation: [0] discusses the
issue but without clearly describing an issue why it wouldn't be
interesting from a 'nothing lost' perspective.

In addition, I noticed that progress reporting of blocks scanned
("heap_blocks_scanned", duh) includes skipped pages. Now that we have
a solid grasp of how many blocks we're planning to scan, we can update
the reported stats to how many blocks we're planning to scan (and have
scanned), increasing the user value of that progress view.

[0] https://www.postgresql.org/message-id/flat/19398.1212328662%40sss.pgh.pa.us#17b2feb0fde6a41779290632d8c70ef1

> + double tableagefrac;

I think this can use some extra info on the field itself, that it is
the fraction of how "old" the relfrozenxid and relminmxid fields are,
as a fraction between 0 (latest values; nextXID and nextMXID), and 1
(values that are old by at least freeze_table_age and
multixact_freeze_table_age (multi)transaction ids, respectively).

> -#define VACOPT_DISABLE_PAGE_SKIPPING 0x80 /* don't skip any pages */
> +#define VACOPT_DISABLE_PAGE_SKIPPING 0x80 /* don't skip using VM */

I'm not super happy with this change. I don't think we should touch
the VM using snapshots _at all_ when disable_page_skipping is set:

> + * Decide vmsnap scanning strategy.
> *
> - * This test also enables more frequent relfrozenxid advancement during
> - * non-aggressive VACUUMs. If the range has any all-visible pages then
> - * skipping makes updating relfrozenxid unsafe, which is a real downside.
> + * First acquire a visibility map snapshot, which determines the number of
> + * pages that each vmsnap scanning strategy is required to scan for us in
> + * passing.

I think we should not take disk-backed vm snapshots when
force_scan_all is set. We need VACUUM to be able to run on very
resource-constrained environments, and this does not do that - it adds
a disk space requirement for the VM snapshot for all but the smallest
relation sizes, which is bad when you realize that we need VACUUM when
we want to clean up things like CLOG.

Additionally, it took me several reads of the code and comments to
understand what the decision-making process for lazy vs eager is, and
why. The comments are interspersed with the code, with no single place
that describes it from a bird's eyes' view. I think something like the
following would be appreciated by other readers of the code:

+ We determine whether we choose the eager or lazy scanning strategy
based on how many extra pages the eager strategy would take over the
lazy strategy, and how "old" the table is (as determined in
tableagefrac):
+ When a table is still "young" (tableagefrac <
TABLEAGEFRAC_MIDPOINT), the eager strategy is accepted if we need to
scan 5% (MAX_PAGES_YOUNG_TABLEAGE) more of the table.
+ As the table gets "older" (tableagefrac between MIDPOINT and
HIGHPOINT), the threshold for eager scanning is relaxed linearly from
this 5% to 70% (MAX_PAGES_OLD_TABLEAGE) of the table being scanned
extra (over what would be scanned by the lazy strategy).
+ Once the tableagefrac passes HIGHPOINT, we stop considering the lazy
strategy, and always eagerly scan the table.

> @@ -1885,6 +1902,30 @@ retry:
> tuples_frozen = 0; /* avoid miscounts in instrumentation */
> }
>
> /*
> + * There should never be dead or deleted tuples when PD_ALL_VISIBLE is
> + * set. Check that here in passing.
> + *
> [...]

I'm not sure this patch is the appropriate place for this added check.
I don't disagree with the change, I just think that it's unrelated to
the rest of the patch. Same with some of the changes in
lazy_scan_heap.

> +vm_snap_stage_blocks

Doesn't this waste a lot of cycles on skipping frozen blocks if most
of the relation is frozen? I'd expected something more like a byte- or
word-wise processing of skippable blocks, as opposed to this per-block
loop. I don't think it's strictly necessary to patch, but I think it
would be a very useful addition for those with larger tables.

> + XIDFrac = (double) (nextXID - cutoffs->relfrozenxid) /
> + ((double) freeze_table_age + 0.5);

I don't quite understand what this `+ 0.5` is used for, could you explain?

> + [...] Freezing and advancing
> + <structname>pg_class</structname>.<structfield>relfrozenxid</structfield>
> + now take place more proactively, in every
> + <command>VACUUM</command> operation.

This claim that it happens more proactively in "every" VACUUM
operation is false, so I think the removal of "every" would be better.

---

> [PATCH v14 3/3] Finish removing aggressive mode VACUUM.

I've not completed a review for this patch - I'll continue on that
tomorrow - but here's a first look:

I don't quite enjoy the refactoring+rewriting of the docs section;
it's difficult to determine what changed when so many things changed
line lengths and were moved around. Tomorrow I'll take a closer look,
but a separation of changes vs moved would be useful for review.

> + /*
> + * Earliest permissible NewRelfrozenXid/NewRelminMxid values that can be
> + * set in pg_class at the end of VACUUM.
> + */
> + TransactionId MinXid;
> + MultiXactId MinMulti;

I don't quite like this wording, but I'm not sure what would be better.

> + cutoffs->MinXid = nextXID - (freeze_table_age * 0.95);
> [...]
> + cutoffs->MinMulti = nextMXID - (multixact_freeze_table_age * 0.95);

Why are these values adjusted down (up?) by 5%? If I configure this
GUC, I'd expect this to be used effectively verbatim; not adjusted by
an arbitrary factor.

---

That's it for now; thanks for working on this,

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2023-01-05 02:18:24 Re: Todo: Teach planner to evaluate multiple windows in the optimal order
Previous Message John Naylor 2023-01-05 01:18:39 Re: [PATCH] Simple code cleanup in tuplesort.c.