Re: Eliminate redundant tuple visibility check in vacuum

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Date: 2023-09-07 22:23:22
Message-ID: CAAKRu_YRv4OvT3t4Dw+8UCuji2aVmQxmkJpXvqvTBTWBHEJ-ww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 7, 2023 at 3:30 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I can fix it by changing the type of PruneResult->off_loc to be an
> > OffsetNumber pointer. This does mean that PruneResult will be
> > initialized partially by heap_page_prune() callers. I wonder if you
> > think that undermines the case for making a new struct.
>
> I think that it undermines the case for including that particular
> argument in the struct. It's not really a Prune*Result* if the caller
> initializes it in part. It seems fairly reasonable to still have a
> PruneResult struct for the other stuff, though, at least to me. How do
> you see it?

Yes, I think off_loc probably didn't belong in PruneResult to begin with.
It is inherently not a result of pruning since it will only be used in
the event that pruning doesn't complete (it errors out).

In the attached v4 patch set, I have both PruneResult and off_loc as
parameters to heap_page_prune(). I have also added a separate commit
which adds comments both above heap_page_prune()'s call site in
lazy_scan_prune() and in the heap_page_prune() function header
clarifying the various points we discussed.

> > I still want to eliminate the NULL check of off_loc in
> > heap_page_prune() by making it a required parameter. Even though
> > on-access pruning does not have an error callback mechanism which uses
> > the offset, it seems better to have a pointless local variable in
> > heap_page_prune_opt() than to do a NULL check for every tuple pruned.
>
> It doesn't seem important to me unless it improves performance. If
> it's just stylistic, I don't object, but I also don't really see a
> reason to care.

I did some performance testing but, as expected, I couldn't concoct a
scenario where the overhead was noticeable in a profile. So much else
is happening in that code, the NULL check basically doesn't matter
(even though it isn't optimized out).

I mostly wanted to remove the NULL checks because I found them
distracting (so, a stylistic complaint). However, upon further
reflection, I actually think it is better if heap_page_prune_opt()
passes NULL. heap_page_prune() has no error callback mechanism that
could use it, and passing a valid value implies otherwise. Also, as
you said, off_loc will always be InvalidOffsetNumber when
heap_page_prune() returns normally, so having heap_page_prune_opt()
pass a dummy value might actually be more confusing for future
programmers.

> > > I haven't run the regression suite with 0001 applied. I'm guessing
> > > that you did, and that they passed, which if true means that we don't
> > > have a test for this, which is a shame, although writing such a test
> > > might be a bit tricky. If there is a test case for this and you didn't
> > > run it, woops.
> >
> > There is no test coverage for the vacuum error callback context
> > currently (tests passed for me). I looked into how we might add such a
> > test. First, I investigated what kind of errors might occur during
> > heap_prune_satisfies_vacuum(). Some of the multixact functions called
> > by HeapTupleSatisfiesVacuumHorizon() could error out -- for example
> > GetMultiXactIdMembers(). It seems difficult to trigger the errors in
> > GetMultiXactIdMembers(), as we would have to cause wraparound. It
> > would be even more difficult to ensure that we hit those specific
> > errors from a call stack containing heap_prune_satisfies_vacuum(). As
> > such, I'm not sure I can think of a way to protect future developers
> > from repeating my mistake--apart from improving the comment like you
> > mentioned.
>
> 004_verify_heapam.pl has some tests that intentionally corrupt pages
> and then use pg_amcheck to detect the corruption. Such an approach
> could probably also be used here. But it's a pain to get such tests
> right, because any change to the page format due to endianness,
> different block size, or whatever can make carefully-written tests go
> boom.

Cool! I hadn't examined how these tests work until now. I took a stab
at writing a test in the existing 0004_verify_heapam.pl. The simplest
thing would be if we could just vacuum the corrupted table ("test")
after running pg_amcheck and compare the error context to our
expectation. I found that this didn't work, though. In an assert
build, vacuum trips an assert before it hits an error while vacuuming
a corrupted tuple in the "test" table. There might be a way of
modifying the existing test code to avoid this, but I tried the next
easiest thing -- corrupting a tuple in the other existing table in the
file, "junk". This is possible to do, but we have to repeat a lot of
the setup code done for the "test" table to get the line pointer array
and loop through and corrupt a tuple. In order to do this well, I
would want to refactor some of the boilerplate into a function. There
are other fiddly bits as well that I needed to consider. I think a
test like this could be useful coverage of the some of the possible
errors that could happen in heap_prune_satisfies_vacuum(), but it
definitely isn't coverage of pg_amcheck (and thus shouldn't go in that
file) and a whole new test which spins up a Postgres to cover
vacuum_error_callback() seemed like a bit much.

- Melanie

Attachment Content-Type Size
v4-0002-Move-heap_page_prune-output-parameters-into-struc.patch text/x-patch 7.9 KB
v4-0003-Reuse-heap_page_prune-tuple-visibility-statuses.patch text/x-patch 10.8 KB
v4-0001-Clarify-usage-of-heap_page_prune-parameter-off_lo.patch text/x-patch 2.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-09-07 22:28:34 Re: Add 'worker_type' to pg_stat_subscription
Previous Message Nathan Bossart 2023-09-07 21:54:13 Re: Document that server will start even if it's unable to open some TCP/IP ports