Re: Deleting older versions in unique indexes to avoid page splits

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-10-14 14:07:54
Message-ID: a1d3c9d3-f070-0bd0-8de2-49184e21b35c@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08.10.2020 02:48, Peter Geoghegan wrote:
> On Tue, Jun 30, 2020 at 5:03 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> Attached is a POC patch that teaches nbtree to delete old duplicate
>> versions from unique indexes. The optimization targets non-HOT
>> duplicate version bloat. Although the patch is rather rough, it
>> nevertheless manages to more or less eliminate a whole class of index
>> bloat: Unique index bloat from non-HOT updates in workloads where no
>> transaction lasts for more than a few seconds.
> I'm slightly surprised that this thread didn't generate more interest
> back in June. After all, maintaining the pristine initial state of
> (say) a primary key index even after many high throughput non-HOT
> updates (i.e. avoiding "logically unnecessary" page splits entirely)
> is quite appealing. It arguably goes some way towards addressing long
> held criticisms of our approach to MVCC. Especially if it can be
> generalized to all b-tree indexes -- the Uber blog post mentioned
> tables that have several indexes, which presumably means that there
> can be no HOT updates (the author of the blog post didn't seem to be
> aware of HOT at all).
The idea seems very promising, especially when extended to handle
non-unique indexes too.
> I've been trying to generalize my approach to work with all indexes. I
> think that I can find a strategy that is largely effective at
> preventing version churn page splits that take place with workloads
> that have many non-HOT updates, without any serious downsides for
> workloads that do not benefit. I want to get feedback on that now,
> since I expect that it will be controversial. Teaching indexes about
> how tuples are versioned or chaining tuples seems like a non-starter,
> so the trick will be to come up with something that behaves in
> approximately the same way as that in cases where it helps.
>
> The approach I have in mind is to pass down a hint from the executor
> to btinsert(), which lets nbtree know that the index tuple insert is
> in fact associated with a non-HOT update. This hint is only given when
> the update did not logically modify the columns that the index covers
That's exactly what I wanted to discuss after the first letter. If we
could make (non)HOT-updates index specific, I think it could improve
performance a lot.
> Here is the maybe-controversial part: The algorithm initially assumes
> that all indexes more or less have the same properties as unique
> indexes from a versioning point of view, even though that's clearly
> not true. That is, it initially assumes that the only reason why there
> can be duplicates on any leaf page it looks at is because some
> previous transaction also did a non-HOT update that added a new,
> unchanged index tuple. The new algorithm only runs when the hint is
> passed down from the executor and when the only alternative is to
> split the page (or have a deduplication pass), so clearly there is
> some justification for this assumption -- it really is highly unlikely
> that this update (which is on the verge of splitting the page) just so
> happened to be the first such update that affected the page.
> To be blunt: It may be controversial that we're accessing multiple
> heap pages while holding an exclusive lock on a leaf page, in the
> hopes that we can avoid a page split, but without any certainty that
> it'll work out.
>
> Sometimes (maybe even most of the time), this assumption turns out to
> be mostly correct, and we benefit in the obvious way -- no
> "unnecessary" page splits for affected non-unique indexes. Other times
> it won't work out, usually because the duplicates are in fact logical
> duplicates from distinct logical rows.
I think that this optimization can affect low cardinality indexes
negatively, but it is hard to estimate impact without tests. Maybe it
won't be a big deal, given that we attempt to eliminate old copies not
very often and that low cardinality b-trees are already not very useful.
Besides, we can always make this thing optional, so that users could
tune it to their workload.

I wonder, how this new feature will interact with physical replication?
Replica may have quite different performance profile. For example, there
can be long running queries, that now prevent vacuumfrom removing
recently-dead rows. How will we handle same situation with this
optimized deletion?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-10-14 14:09:15 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Previous Message vignesh C 2020-10-14 13:41:57 Re: Parallel copy