Re: Deleting older versions in unique indexes to avoid page splits

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-10-22 13:18:36
Message-ID: CA+TgmoayNWZGtTgB+aALsGXjKtQcuOsFDNREZ4i1MVLUp2TXkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 21, 2020 at 3:36 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Bear in mind that we actually do practically the same thing all the
> time with the current LP_DEAD setting stuff, where we need to call
> compute_xid_horizon_for_tuples/heap_compute_xid_horizon_for_tuples
> with a leaf buffer lock held in almost the same way. That's actually
> potentially far worse if you look at it in isolation, because you
> could potentially have hundreds of heap pages, whereas this is just 1
> - 3. (BTW, next version will also do that work in passing, so you're
> practically guaranteed to do less with a buffer lock held compared to
> the typical case of nbtree LP_DEAD setting, even without counting how
> the LP_DEAD bits get set in the first place.)

That's fair. It's not that I'm trying to enforce some absolute coding
rule, as if I even had the right to do such a thing. But, people have
sometimes proposed patches which would have caused major regression in
this area and I think it's really important that we avoid that.
Normally, it doesn't matter: I/O requests complete quickly and
everything is fine. But, on a system where things are malfunctioning,
it makes a big difference whether you can regain control by hitting
^C. I expect you've probably at some point had the experience of being
unable to recover control of a terminal window because some process
was stuck in wait state D on the kernel level, and you probably
thought, "well, this sucks." It's even worse if the kernel's entire
process table fills up with such processes. This kind of thing is
essentially the same issue at the database level, and it's smart to do
what we can to mitigate it.

But that being said, I'm not trying to derail this patch. It isn't,
and shouldn't be, the job of this patch to solve that problem. It's
just better if it doesn't regress things, or maybe even (as you say)
makes them a little better. I think the idea you've got here is
basically good, and a lot of it comes down to how well it works in
practice. I completely agree that looking at amortized cost rather
than worst-case cost is a reasonable principle here; you can't take
that to a ridiculous extreme because people also care about
consistently of performance, but it seems pretty clear from your
description that your patch should not create any big problem in that
area, because the worst-case number of extra buffer accesses for a
single operation is tightly bounded. And, of course, containing index
bloat is its own reward.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jürgen Purtz 2020-10-22 13:32:00 Re: Change JOIN tutorial to focus more on explicit joins
Previous Message Julien Rouhaud 2020-10-22 13:06:36 Re: Collation versioning