Re: Eliminate redundant tuple visibility check in vacuum

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, David Geier <geidav(dot)pg(at)gmail(dot)com>
Subject: Re: Eliminate redundant tuple visibility check in vacuum
Date: 2023-09-07 17:37:35
Message-ID: CA+TgmoZ8E7DXzAZkV=RF9tfJkQFH=a3KkjQhYNY9W26Jh7AN+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 6, 2023 at 5:21 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> Yeah, I think this is a fair concern. I have addressed it in the
> attached patches.
>
> I thought a lot about whether or not adding a PruneResult which
> contains only the output parameters and result of heap_page_prune() is
> annoying since we have so many other *Prune* data structures. I
> decided it's not annoying. In some cases, four outputs don't merit a
> new structure. In this case, I think it declutters the code a bit --
> independent of any other patches I may be writing :)

I took a look at 0001 and I think that it's incorrect. In the existing
code, *off_loc gets updated before each call to
heap_prune_satisfies_vacuum(). This means that if an error occurs in
heap_prune_satisfies_vacuum(), *off_loc will as of that moment contain
the relevant offset number. In your version, the relevant offset
number will only be stored in some local structure to which the caller
doesn't yet have access. The difference is meaningful. lazy_scan_prune
passes off_loc as vacrel->offnum, which means that if an error
happens, vacrel->offnum will have the right value, and so when the
error context callback installed by heap_vacuum_rel() fires, namely
vacuum_error_callback(), it can look at vacrel->offnum and know where
the error happened. With your patch, I think that would no longer
work.

I haven't run the regression suite with 0001 applied. I'm guessing
that you did, and that they passed, which if true means that we don't
have a test for this, which is a shame, although writing such a test
might be a bit tricky. If there is a test case for this and you didn't
run it, woops. This is also why I think it's *extremely* important for
the header comment of a function that takes pointer parameters to
document the semantics of those pointers. Normally they are in
parameters or out parameters or in-out parameters, but here it's
something even more complicated. The existing header comment says
"off_loc is the offset location required by the caller to use in error
callback," which I didn't really understand until I actually looked at
what the code is doing, so I consider that somebody could've done a
better job writing this comment, but in theory you could've also
noticed that, at least AFAICS, there's no way for the function to
return with *off_loc set to anything other than InvalidOffsetNumber.
That means that the statements which set *off_loc to other values must
have some other purpose.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Winand 2023-09-07 17:51:24 Re: Wrong command name in writeable-CTE related error messages
Previous Message Bruce Momjian 2023-09-07 16:58:29 Re: [PATCH] pgrowlocks: Make mode names consistent with docs