Re: Deleting older versions in unique indexes to avoid page splits

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Victor Yegorov <vyegorov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Deleting older versions in unique indexes to avoid page splits
Date: 2020-11-12 23:00:49
Message-ID: CAH2-Wzm46rS1JDmRO0fbnQDpYyD2PrX2cYsa8-5=nhp7nT_w9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 11, 2020 at 12:58 PM Victor Yegorov <vyegorov(at)gmail(dot)com> wrote:
> On the other hand, there's quite a big drop on the UPDATEs throughput. For sure, undersized shared_bufefrs
> contribute to this drop. Still, my experience tells me that under conditions at hand (disabled HOT due to index
> over update time column) tables will tend to accumulate bloat and produce unnecessary IO also from WAL.

I think that the big SELECT statement with an "ORDER BY mtime ... "
was a good way of demonstrating the advantages of the patch.

Attached is v8, which has the enhancements for low cardinality data
that I mentioned earlier today. It also simplifies the logic for
dealing with posting lists that we need to delete some TIDs from.
These posting list simplifications also make the code a bit more
efficient, which might be noticeable during benchmarking.

Perhaps your "we have 5,2% slowdown in UPDATE speed" issue will be at
least somewhat fixed by the enhancements to v8?

Another consideration when testing the patch is the behavioral
differences we see between cases where system throughput is as high as
possible, versus similar cases where we have a limit in place (i.e.
pgbench --rate=? was used). These two cases are quite different with
the patch because the patch can no longer be lazy without a limit --
we tend to see noticeably more CPU cycles spent doing bottom-up
deletion, though everything else about the profile looks similar (I
generally use "perf top" to keep an eye on these things).

It's possible to sometimes see increases in latency (regressions) when
running without a limit, at least in the short term. These increases
can go away when a rate limit is imposed that is perhaps as high as
50% of the max TPS. In general, I think that it makes sense to focus
on latency when we have some kind of limit in place. A
non-rate-limited case is less realistic.

> Perhaps I need to conduct a longer test session, say 8+ hours to make obstacles appear more like in real life.

That would be ideal. It is inconvenient to run longer benchmarks, but
it's an important part of performance validation.

BTW, another related factor is that the patch takes longer to "warm
up". I notice this "warm-up" effect on the second or subsequent runs,
where we have lots of garbage in indexes even with the patch, and even
in the first 5 seconds of execution. The extra I/Os for heap page
accesses end up being buffer misses instead of buffer hits, until the
cache warms. This is not really a problem with fewer longer runs,
because there is relatively little "special warm-up time". (We rarely
experience heap page misses during ordinary execution because the
heapam.c code is smart about locality of access.)

I noticed that the pgbench_accounts_pkey index doesn't grow at all on
the master branch in 20201111-results-master.txt. But it's always just
a matter of time until that happens without the patch. The PK/unique
index takes longer to bloat because it alone benefits from LP_DEAD
setting, especially within _bt_check_unique(). But this advantage will
naturally erode over time. It'll probably take a couple of hours or
more with larger scale factors -- I'm thinking of pgbench scale
factors over 2000.

When the LP_DEAD bit setting isn't very effective (say it's 50%
effective), it's only a matter of time until every original page
splits. But that's also true when LP_DEAD setting is 99% effective.
While it is much less likely that any individual page will split when
LP_DEAD bits are almost always set, the fundamental problem remains,
even with 99% effectiveness. That problem is: each page only has to be
unlucky once. On a long enough timeline, the difference between 50%
effective and 99% effective may be very small. And "long enough
timeline" may not actually be very long, at least to a human.

Of course, the patch still benefits from LP_DEAD bits getting set by
queries -- no change there. It just doesn't *rely* on LP_DEAD bits
keeping up with transactions that create bloat on every leaf page.
Maybe the patch will behave exactly the same way as the master branch
-- it's workload dependent. Actually, it behaves in exactly the same
way for about the first 5 - 15 minutes following pgbench
initialization. This is roughly how long it takes before the master
branch has even one page split. You could say that the patch "makes
the first 5 minutes last forever".

(Not sure if any of this is obvious to you by now, just saying.)

Thanks!
--
Peter Geoghegan

Attachment Content-Type Size
v8-0001-Deprecate-nbtree-s-BTP_HAS_GARBAGE-flag.patch application/octet-stream 21.8 KB
v8-0002-Make-tableam-interface-support-bottom-up-deletion.patch application/octet-stream 7.8 KB
v8-0004-Teach-nbtree-to-use-bottom-up-index-deletion.patch application/octet-stream 53.1 KB
v8-0005-Teach-heapam-to-support-bottom-up-index-deletion.patch application/octet-stream 24.8 KB
v8-0003-Pass-down-logically-unchanged-index-hint.patch application/octet-stream 26.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-11-12 23:01:02 Re: Add important info about ANALYZE after create Functional Index
Previous Message Vik Fearing 2020-11-12 22:44:07 Re: Implement <null treatment> for window functions