Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Victor Yegorov <vyegorov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2021-01-20 21:06:55
Message-ID: CAH2-WzkfGuiU1oqOXSdbawYUtwCACOvmnwuog1nt71seN7YXEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 20, 2021 at 10:53 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> This patch is unusual in that you really need to think about emergent
> behaviors to understand it. That is certainly a difficult thing to do,
> and it's understandable that even an expert might not grok it without
> considering it carefully.

I happened to stumble upon a recent blog post that seems like a light,
approachable introduction to some of the key concepts here:

https://jessitron.com/2021/01/18/when-costs-are-nonlinear-keep-it-small/

Bottom-up index deletion enhances a complex system whose maintenance
costs are *dramatically* nonlinear, at least in many important cases.
If you apply linear thinking to such a system then you'll probably end
up with a bad design.

The system as a whole is made efficient by making sure that we're lazy
when that makes sense, while also making sure that we're eager when
that makes sense. So it almost *has* to be structured as a bottom-up,
reactive mechanism -- no other approach is able to ramp up or down in
exactly the right way. Talking about small cost differences (things
that can easily be empirically measured, perhaps with a
microbenchmark) is almost irrelevant to the big picture. It's even
irrelevant to the "medium picture".

What's more, it's basically a mistake to think of heap page accesses
that don't yield any deletable index tuples as wasted effort (even
though that's how I describe them myself!). Here's why: we have to
access the heap page to learn that it has nothing for us in the first
place place! If we somehow knew ahead of time that some useless-to-us
heap block was useless, then the whole system wouldn't be bottom-up
(by definition). In other words, failing to get any index tuple
deletes from an entire heap page *is itself a form of feedback* at the
local level -- it guides the entire system's behavior over time. Why
should we expect to get that information at zero cost?

This is somehow both simple and complicated, which creates huge
potential for miscommunications. I tried to describe this in various
ways at various points. Perhaps I could have done better with that.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message James Hilliard 2021-01-20 21:52:19 Re: [PATCH 1/1] Fix detection of pwritev support for OSX.
Previous Message Pavel Stehule 2021-01-20 21:03:51 Re: poc - possibility to write window function in PL languages