Quick Links

Re: Deleting older versions in unique indexes to avoid page splits

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Victor Yegorov <vyegorov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Deleting older versions in unique indexes to avoid page splits
Date:	2020-12-19 23:59:59
Message-ID:	CAH2-Wzmj3TVrVeyJ5GFz=nnqO4wUuAJ0-L84qLYOWmb=AEHfQw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Dec 9, 2020 at 5:12 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Most of the real changes in v11 (compared to v10) are in heapam.c.
> I've completely replaced the table_compute_xid_horizon_for_tuples()
> interface with a new interface that supports all existing requirements
> (from index deletions that support LP_DEAD deletion), while also
> supporting these new requirements (e.g. bottom-up index deletion).

I came up with a microbenchmark that is designed to give some general
sense of how helpful it is to include "extra" TIDs alongside
LP_DEAD-in-index TIDs (when they happen to point to the same table
block as the LP_DEAD-in-index TIDs, which we'll have to visit anyway).
The basic idea is simple: Add custom instrumentation that summarizes
each B-Tree index deletion in LOG output, and then run the regression
tests. Attached in the result of this for a "make installcheck-world".

If you're just looking at this thread for the first time in a while:
note that what I'm about to go into isn't technically bottom-up
deletion (though you will see some of that in the full log output if
you look). It's a closely related optimization that only appeared in
recent versions of the patch. So I'm comparing the current approach to
simple deletion of LP_DEAD-marked index tuples to a new enhanced
approach (that makes it a little more like bottom-up deletion, but
only a little).

Here is some sample output (selected almost at random from a text file
consisting of 889 lines of similar output):

... exact TIDs deleted 17, LP_DEAD tuples 4, LP_DEAD-related table blocks 2 )
... exact TIDs deleted 38, LP_DEAD tuples 28, LP_DEAD-related table blocks 1 )
... exact TIDs deleted 39, LP_DEAD tuples 1, LP_DEAD-related table blocks 1 )
... exact TIDs deleted 44, LP_DEAD tuples 42, LP_DEAD-related table blocks 3 )
... exact TIDs deleted 6, LP_DEAD tuples 2, LP_DEAD-related table blocks 2 )

(The initial contents of each line were snipped here, to focus on the
relevant metrics.)

Here we see that the actual number of TIDs/index tuples deleted often
*vastly* exceeds the number of LP_DEAD-marked tuples (which are all we
would have been able to delete with the existing approach of just
deleting LP_DEAD items). It's pretty rare for us to fail to at least
delete a couple of extra TIDs. Clearly this technique is broadly
effective, because in practice there are significant locality-related
effects that we can benefit from. It doesn't really matter that it's
hard to precisely describe all of these locality effects. IMV the
question that really matters is whether or not the cost of trying is
consistently very low (relative to the cost of our current approach to
simple LP_DEAD deletion). We do need to understand that fully.

It's tempting to think about this quantitatively (and it also bolsters
the case for the patch), but that misses the point. The right way to
think about this is as a *qualitative* thing. The presence of LP_DEAD
bits gives us certain reliable information about the nature of the
index tuple (that it is dead-to-all, and therefore safe to delete),
but it also *suggests* quite a lot more than that. In practice bloat
is usually highly correlated/concentrated, especially when we limit
our consideration to workloads where bloat is noticeable in any
practical sense. So we're very well advised to look for nearby
deletable index tuples in passing -- especially since it's practically
free to do so. (Which is what the patch does here, of course.)

Let's compare this to an extreme variant of my patch that runs the
same test, to see what changes. Consider a variant that exhaustively
checks every index tuple on the page at the point of a simple LP_DEAD
deletion operation, no matter what. Rather than only including those
TIDs that happen to be on the same heap/table blocks (and thus are
practically free to check), we include all of them. This design isn't
acceptable in the real world because it does a lot of unnecessary I/O,
but that shouldn't invalidate any comparison I make here. This is
still a reasonable approximation of a version of the patch with
magical foreknowledge of where to find dead TIDs. It's an Oracle
(ahem) that we can sensibly compare to the real patch within certain
constraints.

The results of this comparative analysis seem to suggest something
important about the general nature of what's going on. The results
are: There are only 844 deletion operations total with the Oracle.
While this is less than the actual patch's 889 deletion operations,
you would expect a bigger improvement from using what is after all
supposed to apply magic! This suggests to me that the precise
intervention of the patch here (the new LP_DEAD deletion stuff) has an
outsized impact. The correlations that naturally exist make this
asymmetrical payoff-to-cost situation possible. Simple cheap
heuristics once again go a surprisingly long way, kind of like
bottom-up deletion itself.

--
Peter Geoghegan

Attachment	Content-Type	Size
delete_stats_regression_tests.txt.gz	application/gzip	7.5 KB

In response to

Re: Deleting older versions in unique indexes to avoid page splits at 2020-12-10 01:12:40 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2020-12-20 00:16:05	Re: WIP: BRIN multi-range indexes
Previous Message	Tom Lane	2020-12-19 23:34:17	Re: pg_preadv() and pg_pwritev()