Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-10-14 14:40:14
Message-ID: CAH2-WzmEic9JJ_NJXWo9frRgTqg7q8YuOfew_et3UxJt6zUPfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 14, 2020 at 7:07 AM Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> The idea seems very promising, especially when extended to handle non-unique indexes too.

Thanks!

> That's exactly what I wanted to discuss after the first letter. If we could make (non)HOT-updates index specific, I think it could improve performance a lot.

Do you mean accomplishing the same goal in heapam, by making the
optimization more intelligent about which indexes need new versions?
We did have a patch that did that in 2007, as you may recall -- this
was called WARM:

https://www.postgresql.org/message-id/flat/CABOikdMNy6yowA%2BwTGK9RVd8iw%2BCzqHeQSGpW7Yka_4RSZ_LOQ%40mail.gmail.com

This didn't go anywhere. I think that this solution in more pragmatic.
It's cheap enough to remove it if a better solution becomes available
in the future. But this is a pretty good solution by all important
measures.

> I think that this optimization can affect low cardinality indexes negatively, but it is hard to estimate impact without tests. Maybe it won't be a big deal, given that we attempt to eliminate old copies not very often and that low cardinality b-trees are already not very useful. Besides, we can always make this thing optional, so that users could tune it to their workload.

Right. The trick is to pay only a fixed low cost (maybe as low as one
heap page access) when we start out, and ratchet it up only if the
first heap page access looks promising. And to avoid posting list
tuples. Regular deduplication takes place when this fails. It's useful
for the usual reasons, but also because this new mechanism learns not
to try the posting list TIDs.

> I wonder, how this new feature will interact with physical replication? Replica may have quite different performance profile.

I think of that as equivalent to having a long running transaction on
the primary. When I first started working on this patch I thought
about having "long running transaction detection". But I quickly
realized that that isn't a meaningful concept. A transaction is only
truly long running relative to the writes that take place that have
obsolete row versions that cannot be cleaned up. It has to be
something we can deal with, but it cannot be meaningfully
special-cased.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-10-14 14:40:50 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Previous Message Julien Rouhaud 2020-10-14 14:34:31 Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?