Re: Combine Prune and Freeze records emitted by vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Combine Prune and Freeze records emitted by vacuum
Date: 2024-03-13 23:25:56
Message-ID: 20240313232556.ohv3z7qzc5wtwqh5@liskov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 11, 2024 at 6:38 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 09/03/2024 22:41, Melanie Plageman wrote:
> > On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >> Does GlobalVisTestIsRemovableXid() handle FrozenTransactionId correctly?
> >
> > Okay, so I thought a lot about this, and I don't understand how
> > GlobalVisTestIsRemovableXid() would not handle FrozenTransactionId
> > correctly.
> >
> > vacrel->cutoffs.OldestXmin is computed initially from
> > GetOldestNonRemovableTransactionId() which uses ComputeXidHorizons().
> > GlobalVisState is updated by ComputeXidHorizons() (when it is
> > updated).
> >
> > I do see that the comment above GlobalVisTestIsRemovableXid() says
> >
> > * It is crucial that this only gets called for xids from a source that
> > * protects against xid wraparounds (e.g. from a table and thus protected by
> > * relfrozenxid).
> >
> > and then in
> >
> > * Convert 32 bit argument to FullTransactionId. We can do so safely
> > * because we know the xid has to, at the very least, be between
> > * [oldestXid, nextXid), i.e. within 2 billion of xid.
> >
> > I'm not sure what oldestXid is here.
> > It's true that I don't see GlobalVisTestIsRemovableXid() being called
> > anywhere else with an xmin as an input. I think that hints that it is
> > not safe with FrozenTransactionId. But I don't see what could go
> > wrong.
> >
> > Maybe it has something to do with converting it to a FullTransactionId?
> >
> > FullTransactionIdFromU64(U64FromFullTransactionId(rel) + (int32)
> > (xid - rel_xid));
> >
> > Sorry, I couldn't quite figure it out :(
>
> I just tested it. No, GlobalVisTestIsRemovableXid does not work for
> FrozenTransactionId. I just tested it with state->definitely_needed ==
> {0, 4000000000} and xid == FrozenTransactionid, and it incorrectly
> returned 'false'. It treats FrozenTransactionId as if was a regular xid '2'.

I see. Looking at the original code:
if (!TransactionIdPrecedes(xmin,
vacrel->cutoffs.OldestXmin))

And the source code for TransactionIdPrecedes:

if (!TransactionIdIsNormal(id1) || !TransactionIdIsNormal(id2))
return (id1 < id2);

diff = (int32) (id1 - id2);
return (diff < 0);

In your example, It seems like you mean GlobalVisState->maybe_needed is
0 and GlobalVisState->definitely_needed = 4000000000. In this example,
if vacrel->cutoffs.OldestXmin was 0, we would get a wrong answer also.

But, I do see that the comparison done by TransactionIdPrecedes() is is
quite different than that done by FullTransactionIdPrecedes() because of
the modulo 2**32 arithmetic.

Solving the handling of FrozenTransactionId specifically by
GlobalVisTestIsRemovableXid() seems like it would be done easily in our
case by wrapping it in a function which checks if
TransactionIdIsNormal() and returns true if it is not normal. But, I'm
not sure if I am missing the larger problem.

> >> The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
> >> XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
> >> the case that there's no pruning, just freezing. The record format
> >> (xl_heap_prune) looks pretty complex as it is, I think it could be made
> >> both more compact and more clear with some refactoring.
> >
> > I'm happy to change up xl_heap_prune format. In its current form,
> > according to pahole, it has no holes and just 3 bytes of padding at
> > the end.
> >
> > One way we could make it smaller is by moving the isCatalogRel member
> > to directly after snapshotConflictHorizon and then adding a flags
> > field and defining flags to indicate whether or not other members
> > exist at all. We could set bits for HAS_FREEZE_PLANS, HAS_REDIRECTED,
> > HAS_UNUSED, HAS_DEAD. Then I would remove the non-optional uint16
> > nredirected, nunused, nplans, ndead and just put the number of
> > redirected/unused/etc at the beginning of the arrays containing the
> > offsets.
>
> Sounds good.
>
> > Then I could write various macros for accessing them. That
> > would make it smaller, but it definitely wouldn't make it less complex
> > (IMO).
>
> I don't know, it might turn out not that complex. If you define the
> formats of each of those "sub-record types" as explicit structs, per
> attached sketch, you won't need so many macros. Some care is still
> needed with alignment though.

In the attached v2, I've done as you suggested and made all members
except flags and snapshotConflictHorizon optional in the xl_heap_prune
struct and obsoleted the xl_heap_freeze struct. I've kept the actual
xl_heap_freeze_page struct and heap_xlog_freeze_page() function so that
we can replay previously made XLOG_HEAP2_FREEZE_PAGE records.

Because we may set line pointers unused during vacuum's first pass, I
couldn't use the presence of nowunused without redirected or dead items
to indicate that this was a record emitted by vacuum's second pass. As
such, I haven't obsoleted the xl_heap_vacuum struct. I was thinking I
could add a flag that indicates the record was emitted by vacuum's
second pass? We would want to distinguish this so that we could set the
items unused without calling heap_page_prune_execute() -- because that
calls PageRepairFragmentation() which requires a full cleanup lock.

I introduced a few sub-record types similar to what you suggested --
they help a bit with alignment, so I think they are worth keeping. There
are comments around them, but perhaps a larger diagram of the layout of
a the new XLOG_HEAP2_PRUNE record would be helpful.

There is a bit of duplicated code between heap_xlog_prune() and
heap2_desc() since they both need to deserialize the record. Before the
code to do this was small and it didn't matter, but it might be worth
refactoring it that way now.

Note that I've made all of the changes related to obsoleting the
XLOG_HEAP2_FREEZE_PAGE record in separate commits on top of the rest of
the set for ease of review. However, I've rebased the other review
feedback into the relevant commits.

On Wed, Mar 6, 2024 at 7:59 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> I don't think we need XLOG_HEAP2_FREEZE_PAGE as a separate record type
> anymore. By removing that, you also get rid of the freeze-only codepath
> near the end of heap_page_prune(), and the
> heap_freeze_execute_prepared() function.
>
> The XLOG_HEAP2_FREEZE_PAGE record is a little smaller than
> XLOG_HEAP2_PRUNE. But we could optimize the XLOG_HEAP2_PRUNE format for
> the case that there's no pruning, just freezing. The record format
> (xl_heap_prune) looks pretty complex as it is, I think it could be made
> both more compact and more clear with some refactoring.

On the point of removing the freeze-only code path from
heap_page_prune() (now heap_page_prune_and_freeze()): while doing this,
I realized that heap_pre_freeze_checks() was not being called in the
case that we decide to freeze because we emitted an FPI while setting
the hint bit. I've fixed that, however, I've done so by moving
heap_pre_freeze_checks() into the critical section. I think that is not
okay? I could move it earlier and not do call it when the hint bit FPI
leads us to freeze tuples. But, I think that would lead to us doing a
lot less validation of tuples being frozen when checksums are enabled.
Or, I could make two critical sections?

> FreezeMultiXactId still takes a separate 'cutoffs' arg, but it could use
> pagefrz->cutoffs now.

Fixed this.

> HeapPageFreeze has two "trackers", for the "freeze" and "no freeze"
> cases. heap_page_prune() needs to track both, until it decides whether
> to freeze or not. But it doesn't make much sense that the caller
> (lazy_scan_prune()) has to initialize both, and has to choose which of
> the values to use after the call depending on whether heap_page_prune()
> froze or not. The two trackers should be just heap_page_prune()'s
> internal business.

I've added new_relminmxid and new_relfrozenxid to PruneFreezeResult and
set them appropriately in heap_page_prune_and_freeze().

It's a bit sad because if it wasn't for vacrel->skippedallvis,
vacrel->NewRelfrozenXid and vacrel->NewRelminMxid would be
vacrel->cutoffs.OldestXmin and vacrel->cutoffs.OldestMxact respectively
and we could avoid having lazy_scan_prune() initializing the
HeapPageFreeze altogether and just pass VacuumCutoffs (and
heap_page_prune_opt() could pass NULL) to heap_page_prune_and_freeze().

I think it is probably worse to add both of them as additional optional
arguments, so I've just left lazy_scan_prune() with the job of
initializing them.

In heap_page_prune_and_freeze(), I initialize new_relminmxid and
new_relfrozenxid to InvalidMultiXactId and InvalidTransactionId
respectively because on-access pruning doesn't have a value to set them
to. But I wasn't sure if this was okay -- since I don't see validation
that TransactionIdIsValid() in vac_update_relstats(). It will work now
-- just worried about future issues. I could add an assert there?

> HeapPageFreeze is a bit confusing in general, as it's both an input and
> an output to heap_page_prune(). Not sure what exactly to do there, but I
> feel that we should make heap_page_prune()'s interface more clear.
> Perhaps move the output fields to PruneResult.

HeapPageFrz is now only an input argument to
heap_page_prune_and_freeze() as of the commit in which
heap_page_prune_and_freeze() becomes responsible for executing freezing.
It is still an in/out param to earlier commits because we still need
info from it to execute freezing in lazy_scan_prune().

> Let's rename heap_page_prune() to heap_page_prune_and_freeze(), as
> freezing is now an integral part of the function. And mention it in the
> function comment, too.

I've done this.

> In heap_prune_chain:
>
> >  * Tuple visibility information is provided in presult->htsv.
>
> Not this patch's fault directly, but it's not immediate clear what "is
> provided" means here. Does the caller provide it, or does the function
> set it, ie. is it an input or output argument? Looking at the code, it's
> an input, but now it looks a bit weird that an input argument is called
> 'presult'.

I haven't updated the comments about this in the intermediate commits
since it ends up in the PruneState as an input.

- Melanie

Attachment Content-Type Size
v2-0001-lazy_scan_prune-tests-tuple-vis-with-GlobalVisTes.patch text/x-diff 1.7 KB
v2-0002-Pass-heap_prune_chain-PruneResult-output-paramete.patch text/x-diff 3.3 KB
v2-0003-heap_page_prune-sets-all_visible-and-frz_conflict.patch text/x-diff 18.4 KB
v2-0004-Add-reference-to-VacuumCutoffs-in-HeapPageFreeze.patch text/x-diff 12.4 KB
v2-0005-Prepare-freeze-tuples-in-heap_page_prune.patch text/x-diff 13.3 KB
v2-0006-lazy_scan_prune-reorder-freeze-execution-logic.patch text/x-diff 6.9 KB
v2-0007-Execute-freezing-in-heap_page_prune.patch text/x-diff 27.4 KB
v2-0008-Make-opp-freeze-heuristic-compatible-with-prune-f.patch text/x-diff 4.4 KB
v2-0009-Separate-tuple-pre-freeze-checks-and-invoke-earli.patch text/x-diff 5.6 KB
v2-0010-Inline-heap_freeze_execute_prepared.patch text/x-diff 8.4 KB
v2-0011-Exit-heap_page_prune-early-if-no-prune.patch text/x-diff 8.3 KB
v2-0012-Merge-prune-and-freeze-records.patch text/x-diff 8.9 KB
v2-0013-Set-hastup-in-heap_page_prune.patch text/x-diff 7.2 KB
v2-0014-Count-tuples-for-vacuum-logging-in-heap_page_prun.patch text/x-diff 15.1 KB
v2-0015-Save-dead-tuple-offsets-during-heap_page_prune.patch text/x-diff 7.0 KB
v2-0016-Obsolete-XLOG_HEAP2_FREEZE_PAGE.patch text/x-diff 13.1 KB
v2-0017-Streamline-XLOG_HEAP2_PRUNE-record.patch text/x-diff 19.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-03-13 23:35:27 Re: Vectored I/O in bulk_write.c
Previous Message David Rowley 2024-03-13 23:00:24 Re: pg16: XX000: could not find pathkey item to sort