Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2021-01-07 23:07:56
Message-ID: CAH2-WzmROa1eqEDRDQNWNuuHOy-8Csqnt12myazT2b-jb6sCXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 4, 2021 at 8:28 AM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> I've looked through the v12 patch.
>
> I like the new outline:
>
> - _bt_delete_or_dedup_one_page() is the main entry for the new code
> - first we try _bt_simpledel_pass() does improved cleanup of LP_DEAD entries
> - then (if necessary) _bt_bottomupdel_pass() for bottomup deletion
> - finally, we perform _bt_dedup_pass() to deduplication
>
> We split the leaf page only if all the actions above failed to provide enough space.

I'm thinking of repeating the LP_DEAD enhancement within GiST and hash
shortly after this, as follow-up work. Their implementation of LP_DEAD
deletion was always based on the nbtree original, and I think that it
makes sense to keep everything in sync. The simple deletion
enhancement probably makes just as much sense in these other places.

> + * Only do this for key columns. A change to a non-key column within an
> + * INCLUDE index should not be considered because that's just payload to
> + * the index (they're not unlike table TIDs to the index AM).
>
> The last part of it (in the parenthesis) is difficult to grasp due to
> the double negation (not unlike). I think it's better to rephrase it.

Okay, will do.

> 2. After reading the patch, I also think, that fact, that index_unchanged_by_update()
> and index_unchanged_by_update_var_walker() return different bool states
> (i.e. when the latter returns true, the first one returns false) is a bit misleading.

> Although logic as it is looks fine, maybe index_unchanged_by_update_var_walker()
> can be renamed to avoid this confusion, to smth like index_expression_changed_walker() ?

I agree. I'll use the name index_expression_changed_walker() in the
next revision.

> 2. I'm not sure the bottomup_delete_items parameter is very helpful. In order to disable
> bottom-up deletion, DBA needs somehow to measure it's impact on a particular index.
> Currently I do not see how to achieve this. Not sure if this is overly important, though, as
> you have a similar parameter for the deduplication.

I'll take bottomup_delete_items out, since I think that you may be
right about that. It can be another "decision to recheck mid-beta"
thing (on the PostgreSQL 14 open items list), so we can delay making a
final decision on it until after it gets tested by the broadest
possible group of people.

> 3. It feels like indexUnchanged is better to make indexChanged and negate its usage in the code.
> As !indexChanged reads more natural than !indexUnchanged, at least to me.

I don't want to do that because then we're negating "indexChanged" as
part of our gating condition to the bottom-up deletion pass code. To
me this feels wrong, since the hint only exists to be used to trigger
index tuple deletion. This is unlikely to ever change.

> First, I run a 2-hour long case with the same setup as I used in my e-mail from 15 of November.
> I found no difference between patch and master whatsoever. Which makes me think, that current
> master is quite good at keeping better bloat control (not sure if this is an effect of
> 4228817449 commit or deduplication).

Commit 4228817449 can't have improved the performance of the master
branch -- that commit was just about providing a *correct*
latestRemovedXid value. It cannot affect how many index tuples get
deleted per pass, or anything like that.

Not surprised that you didn't see many problems with the master branch
on your first attempt. It's normal for there to be non-linear effects
with this stuff, and these are very hard (maybe even impossible) to
model. For example, even with artificial test data you often see
distinct "waves" of page splits, which is a phenomenon pretty well
described by the "Waves of misery after index creation" paper [1].
It's likely that your original stress test case (the 15 November one)
would have shown significant bloat if you just kept it up for long
enough. One goal for this project is to make the performance
characteristics more predictable. The performance characteristics are
unpredictable precisely because they're pathological. B-Trees will
always have non-linear performance characteristics, but they can be
made a lot less chaotic in practice.

To be honest, I was slightly surprised that your more recent stress
test (the queue thing) worked out so well for the patch. But not too
surprised, because I don't necessarily expect to understand how the
patch can help for any given workload -- this is dynamic behavior
that's part of a complex system. I first thought that maybe the
presence of the INSERTs and DELETEs would mess things up. But I was
wrong -- everything still worked well, probably because bottom-up
index deletion "cooperated" with deduplication. The INSERTs could not
trigger a bottom-up deletion (which might have been useful for these
particular INSERTs), but that didn't actually matter because
deduplication was triggered instead, which "held the line" for long
enough for bottom-up deletion to eventually get triggered (by non-HOT
UPDATEs).

As I've said before, I believe that the workload should "figure out
the best strategy for handling bloat on its own". A diversity of
strategies are available, which can adapt to many different kinds of
workloads organically. That kind of approach is sometimes necessary
with a complex system IMV.

[1] https://btw.informatik.uni-rostock.de/download/tagungsband/B2-2.pdf
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2021-01-07 23:52:02 Re: list of extended statistics on psql
Previous Message Peter Geoghegan 2021-01-07 22:03:59 Re: data_checksums enabled by default (was: Move --data-checksums to common options in initdb --help)