Re: New IndexAM API controlling index vacuum strategies

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Noah Misch <noah(at)leadboat(dot)com>
Subject: Re: New IndexAM API controlling index vacuum strategies
Date: 2021-03-31 22:27:58
Message-ID: CAH2-WzkCw8TQLMX_q87R=FMNjkjWWAidgDUrVtfvMHceHJkmgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 31, 2021 at 9:29 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I can't effectively review 0001 because it both changes the code for
> individual functions significantly and reorders them within the file.
> I think it needs to be separated into two patches, one of which makes
> the changes and the other of which reorders stuff. I would probably
> vote for just dropping the second one, since I'm not sure there's
> really enough value there to justify the code churn, but if we're
> going to do it, I think it should definitely be done separately.

Thanks for the review!

I'll split it up that way. I think that I need to see it both ways
before deciding if I should push back on that. I will admit that I was
a bit zealous in rearranging things because it seems long overdue. But
I might well have gone too far with rearranging code.

> * "onerel" is a stupid naming convention that I'd rather not propagate
> further. It makes sense in the context of a function whose job it is
> to iterate over a list of relations and do something for each one. But
> once you're down into the code that only knows about one relation in
> the first place, calling that relation "onerel" rather than "rel" or
> "vacrel" or even "heaprel" is just confusing. Ditto for "onerelid".

I agree, and can change it. Though at the cost of more diff churn.

> * Moving stuff from static variables into LVRelState seems like a
> great idea. Renaming it from LVRelStats seems like a good idea, too.

The static variables were bad, but nowhere near as bad as the
variables that are local to lazy_scan_heap(). They are currently a
gigantic mess.

Not that LVRelStats was much better. We have the latestRemovedXid
field in LVRelStats, and dead_tuples, but *don't* have a bunch of
things that really are stats (it seems to be quite random). Calling
the struct LVRelStats was always dubious.

> * Setting vacrel->lps = NULL "for now" when we already did palloc0 at
> allocation time seems counterproductive.

Okay, will fix.

> * The code associated with the comment block that says "Initialize
> state for a parallel vacuum" has been moved inside lazy_space_alloc().
> That doesn't seem like an especially good choice, because no casual
> reader is going to expect a function called lazy_space_alloc() to be
> entering parallel mode and so forth as a side effect. Also, the call
> to lazy_space_alloc() still has a comment that says "Allocate the
> space for dead tuples in case parallel vacuum is not initialized."
> even though the ParallelVacuumIsActive() check has been removed and
> the function now does a lot more than allocating space.

Will fix.

> * lazy_scan_heap() removes the comment which begins "Note that
> vacrelstats->dead_tuples could have tuples which became dead after
> HOT-pruning but are not marked dead yet." But IIUC that special case
> is removed by a later patch, not 0001, in which case it is that patch
> that should be touching this comment.

Will fix.

> Regarding 0002:
>
> * It took me a while to understand why lazy_scan_new_page() and
> lazy_scan_empty_page() are named the way they are. I'm not sure
> exactly what would be better, so I am not necessarily saying I think
> you have to change anything, but for the record I think this naming
> sucks.

I agree -- it's dreadful.

> The reason we have "lazy" in here, AFAIU, is because originally
> we only had old-style VACUUM FULL, and that was the good hard-working
> VACUUM, and what we now think of as VACUUM was the "lazy" version that
> didn't really do the whole job. Then we decided it was the
> hard-working version that actually sucked and we always wanted to be
> lazy (or else rewrite the table). So now we have all of these
> functions named "lazy" which are really just functions to do "vacuum".

FWIW I always thought that the terminology was lifted from the world
of garbage collection. There is a thing called a lazy sweep algorithm.
Isn't vacuuming very much like sweeping? There are also mark-sweep
garbage collection algorithms that take two passes, one phase
variants, etc.

In general the world of garbage collection has some ideas that might
be worth pilfering for ideas. It's not all that relevant to our world,
and a lot of it is totally irrelevant, but there is enough overlap for
it to interest me. Though GC is such a vast and complicated world that
it's difficult to know where to begin. I own a copy of the book
"Garbage Collection: Algorithms for Automatic Dynamic Memory
Management". Most of it goes over my head, but I have a feeling that
I'll get my $20 worth at some point.

> If, for example, we could agree that the whole thing is vacuum and the first
> time we touch the heap page that's the strawberry phase and then the
> second time we touch it that's the rhubarb phase, then we could have
> vacuum_strawberry_page(), vacuum_strawberry_new_page(),
> vacuum_rhubarb_phase(), etc. and everything would be a lot clearer,
> assuming that you replaced the words "strawberry" and "rhubarb" with
> something actually meaningful. But that seems hard. I thought about
> suggesting that the word for strawberry should be "prune", but it does
> more than that. I thought about suggesting that either the word for
> strawberry or the word for rhubarb should be "cleanup," but that's
> another word that is already confusingly overloaded. So I don't know.

Maybe we should just choose a novel name that isn't exactly
descriptive but is at least distinct and memorable.

I think that the word for strawberry should be "prune". This isn't
100% accurate because it reduces the first phase to pruning. But it is
a terminology that has verisimilitude, which is no small thing. The
fact is that pruning is pretty much the point of the first phase
(freezing is too, but that happens quite specifically is only
considered for non-pruned items, so it doesn't undermine my point
much). If we called the first/strawberry pass over the heap pruning or
"the prune phase" then we'd have something much more practical and
less confusing than any available alternative that I can think of.
Plus it would still be fruit-based.

I think that our real problem is with Rhubarb. I hate the use of the
terminology "heap vacuum" in the context of the second phase/pass.
Whatever terminology we use, we should frame the second phase as being
mostly about recycling LP_DEAD line pointers my turning them into
LP_UNUSED line pointers. We are recycling the space for "cells" that
get logically freed in the first phase (both in indexes, and finally
in the heap).

I like the idea of framing the first phase as being concerned with the
logical database, while the second phase (which includes index
vacuuming and heap vacuuming) is concerned only with physical data
structures (so it's much dumber than the first). That's only ~99% true
today, but the third/"tupgone gone" patch will make it 100% true.

> * But all that having been said, it's easy to get confused and think
> that lazy_scan_new_page() is scanning a new page for lazy vacuum, but
> in fact it's the new-page handler for the scan phase of lazy vacuum,
> and it doesn't scan anything at all. If there's a way to avoid that
> kind of confusion, +1 from me.

This is another case where I need to see it the other way.

> * One possibility is that maybe it's not such a great idea to put this
> logic in its own function. I'm rather suspicious on principle of
> functions that are called with a locked or pinned buffer and release
> the lock or pin before returning. It suggests that the abstraction is
> not very clean.

I am sympathetic here. In fact most of those functions were added at
the suggestion of Andres. I think that they're fine, but it's
reasonable to wonder if we're coming out ahead by having all but one
of them (lazy_scan_prune()). The reality is that they share state
fairly promiscuously, so I'm not really hiding complexity. The whole
idea here should be to remove inessential complexity in how we
represent and consume state.

However, it's totally different in the case of the one truly important
function among this group of new lazy_scan_heap() functions,
lazy_scan_prune(). It seems like a *huge* improvement to me. The
obvious advantage of having that function is that calling that
function can be considered a shorthand for "a blkno loop iteration
that actually does real work". Everything else in the calling loop is
essentially either preamble or postscript to lazy_scan_prune(), since
we don't actually need to set VM bits, or to skip heap blocks, or to
save space in the FSM. I think that that's a big difference.

There is a slightly less obvious advantage, too. It's clear that the
function as written actually does do a good job of reducing
state-related complexity, because it effectively returns a
LVPagePruneState (we pass a pointer but nothing gets initialized
before the call to lazy_scan_prune()). So now it's really obvious what
state is managed by pruning/freezing, and it's obvious what follows
from that when we return control to lazy_scan_heap(). This ties
together with my first point about pruning being the truly important
piece of work. That really does hide complexity rather well,
especially compared to the other new functions from the second patch.

> * The new comment added which begins "Even if we skipped heap vacuum,
> ..." is good, but perhaps it could be more optimistic. It seems to me
> that it's not just that it *could* be worthwhile because we *could*
> have updated freespace, but that those things are in fact probable.

Will fix.

> * I'm not really a huge fan of comments that include step numbers,
> because they tend to cause future patches to have to change a bunch of
> comments every time somebody adds a new step, or, less commonly,
> removes an old one. I would suggest revising the comments you've added
> that say things like "Step N for block: X" to just "X". I do like the
> comment additions, just not the attributing of specific numbers to
> specific steps.

I see your point.

I added the numbers because I wanted the reader to notice a parallel
construction among these related high-level comments. I wanted each to
act as a bullet point that frames both code and related interspersed
low-level comments (without the risk of it looking like just another
low-level comment).

The numbers are inessential. Maybe I could do "Next step: " at the
start of each comment instead. Leave it with me.

> * As in 0001, core logical changes are obscured by moving code and
> changing it in the same patch. All this logic gets moved into
> lazy_scan_prune() and revised at the same time. Using git diff
> --color-moved -w sorta works, but even then there are parts of it that
> are pretty hard to read, because there's a bunch of other stuff that
> gets rejiggered at the same time.

Theoretically, nothing really changes until the third patch, except
for the way that we do the INDEX_CLEANUP=off stuff.

What you say about 0001 is understandable and not that surprising. But
FWIW 0002 doesn't move code around in the same way as 0001 -- it is
not nearly as mechanical. Like I said, let me see if the lazy_scan_*
functions from 0002 are adding much (aside from lazy_scan_prune(), of
course). To be honest they were a bit of an afterthought --
lazy_scan_prune() was my focus in 0002.

I can imagine adding a lot more stuff to lazy_scan_prune() in the
future too. For example, maybe we can do more freezing earlier based
on whether or not we can avoid dirtying the page by not freezing. The
structure of lazy_scan_prune() can do stuff like that because it gets
to see the page before and after both pruning and freezing -- and with
the retry stuff in 0003, it can back out of an earlier decision to not
freeze as soon as it realizes the page is getting dirtied either way.
Also, I think we might end up collecting more complicated information
that informs our eventual decision about whether or not indexes need
to be vacuumed.

> My concentration is flagging a bit so I'm going to stop reviewing here
> for now. I'm not deeply opposed to any of what I've seen so far. My
> main criticism is that I think more thought should be given to how
> things are named and to separating minimal code-movement patches from
> other changes.

That seems totally reasonable.

Thanks again!

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-03-31 22:31:45 Re: libpq debug log
Previous Message Melanie Plageman 2021-03-31 22:25:49 Re: Assertion failure with barriers in parallel hash join