Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2021-01-18 20:43:41
Message-ID: CAH2-Wzno9q1Wj7a9rk5G4jQe7BupRe5y6NVobVw8EiVLDcad8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 18, 2021 at 6:11 AM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> If I understand you correctly, you suggest to track _all_ the hints that came
> from the executor for possible non-HOT logical duplicates somewhere on
> the page. And when we hit the no-space case, we'll check not only the last
> item being hinted, but all items on the page, which makes it more probable
> to kick in and do something.

> Also, I'm not sure where to put it. We've deprecated the BTP_HAS_GARBAGE
> flag, maybe it can be reused for this purpose?

There actually was a similar flag (named BTP_UNCHANGED_UPDATE and
later BTP_HAS_DUPS) that appeared in earlier versions of the patch
(several versions in total, up to and including v6). This was never
discussed on the list because I assumed that that wouldn't be helpful
(I was already writing too many overlong e-mails). It was unsettled in
my mind at the time, so it didn't make sense to start discussing. I
changed my mind at some point, and so it never came up until now, when
Amit raised the question.

Looking back on my personal notes, I am reminded that I debated this
exact question with myself at length. The argument for keeping the
nbtree flag (i.e. what Amit is arguing for now) involved an isolated
example that seems very similar to the much more recent example from
Amit (that he arrived at independently). I am at least sympathetic to
this view of things, even now. Let me go into why I changed my mind
after a couple of months of on and off deliberation.

In general, the highly speculative nature of the extra work that
heapam.c does for index deleters in the bottom-up caller case can only
be justified because the cost is paid by non-HOT updaters that are
definitely about to split the page just to fit another version, and
because we only risk wasting one heap page access at any given point
of the entire process (the latter point about risk/cost is not 100%
true, because you have additional fixed CPU costs and stuff like that,
but it's at least true in spirit). We can justify "gambling" like this
only because the game is rigged in our favor to an *absurd* degree:
there are many ways to win big (relative to the likely version churn
page split baseline case), and we only have to put down relatively
small "bets" at any given point -- heapam.c will give up everything
when it encounters one whole heap page that lacks a single deletable
entry, no matter what the reason is.

The algorithm *appears* to behave very intelligently when seen from
afar, but is in fact stupid and opportunistic when you look at it up
close. It's possible to be so permissive about the upside benefit by
also being very conservative about the downside cost. Almost all of
our individual inferences can be wrong, and yet we still win in the
end. And the effect is robust over time. You could say that it is an
organic approach: it adapts to the workload, rather than trying to
make the workload adapt to it. This is less radical than you'd think
-- in some sense this is how B-Trees have always worked.

In the end, I couldn't justify imposing this cost on anything other
than a non-HOT updater, which is what the flag proposal would require
me to do -- then it's not 100% clear that the relative cost of each
"bet" placed in heapam.c really is extremely low (we can no longer say
for sure that we have very little to lose, given the certain
alternative that is a version churn page split). The fact is that
"version chains in indexes" still cannot get very long. Plus there are
other subtle ways in which it's unlikely to be a problem (e.g. the
LP_DEAD deletion stuff also got much better, deduplication can combine
with bottom-up deletion so that we'll trigger a useful bottom-up
deletion pass sooner or later without much downside, and possibly
other things I haven't even thought of).

It's possible that I have been too conservative. I admit that my
decision on this point is at least a little arbitrary, but I stand by
it for now. I cannot feel too bad about theoretically leaving some
gains on the table, given that we're only talking about deleting a
group of related versions a little later than we would otherwise, and
only in some circumstances, and in a way that seems likely to be
imperceptible to users. I reserve the right to change my mind about
it, but for now it doesn't look like an absurdly good deal, and those
are the kind of deals that it makes sense to focus on here. I am very
happy about the fact that it is relatively easy to understand why the
worst case for bottom-up index deletion cannot be that bad.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2021-01-18 20:52:19 Re: Code of Conduct plan,Re: Code of Conduct plan
Previous Message Michail Nikolaev 2021-01-18 20:30:21 [PATCH] Full support for index LP_DEAD hint bits on standby