Re: Slow standby snapshot

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, reshkekirill <reshkekirill(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slow standby snapshot
Date: 2022-11-16 12:23:46
Message-ID: CANtu0oiPoSdQsjRd6Red5WMHi1E83d2+-bM9J6dtWR3c5Tap9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello everyone.

> However ... I tried to reproduce the original complaint, and
> failed entirely. I do see KnownAssignedXidsGetAndSetXmin
> eating a bit of time in the standby backends, but it's under 1%
> and doesn't seem to be rising over time. Perhaps we've already
> applied some optimization that ameliorates the problem? But
> I tested v13 as well as HEAD, and got the same results.

> Hmm. I wonder if my inability to detect a problem is because the startup
> process does keep ahead of the workload on my machine, while it fails
> to do so on the OP's machine. I've only got a 16-CPU machine at hand,
> which probably limits the ability of the primary to saturate the standby's
> startup process.

Yes, optimization by Andres Freund made things much better, but the
impact is still noticeable.

I was also using 16CPU machine - but two of them (primary and standby).

Here are the scripts I was using (1) for benchmark - maybe it could help.

> Nowadays we've *got* those primitives. Can we get rid of
> known_assigned_xids_lck, and if so would it make a meaningful
> difference in this scenario?

I was trying it already - but was unable to find real benefits for it.
WIP patch in attachment.

Hm, I see I have sent it to list, but it absent in archives... Just
quote from it:

> First potential positive effect I could see is
> (TransactionIdIsInProgress -> KnownAssignedXidsSearch) locking but
> seems like it is not on standby hotpath.

> Second one - locking for KnownAssignedXidsGetAndSetXmin (build
> snapshot). But I was unable to measure impact. It wasn’t visible
> separately in (3) test.

> Maybe someone knows scenario causing known_assigned_xids_lck or
> TransactionIdIsInProgress become bottleneck on standby?

The latest question is still actual :)

> I think it might be a bigger effect than one might immediately think. Because
> the spinlock will typically be on the same cacheline as head/tail, and because
> every spinlock acquisition requires the cacheline to be modified (and thus
> owned mexlusively) by the current core, uses of head/tail will very commonly
> be cache misses even in workloads without a lot of KAX activity.

I was trying to find some way to practically achieve any noticeable
impact here, but unsuccessfully.

>> But yeah, it does feel like the proposed
>> approach is only going to be optimal over a small range of conditions.

> In particular, it doesn't adapt at all to workloads that don't replay all that
> much, but do compute a lot of snapshots.

The approach (2) was optimized to avoid any additional work for anyone
except non-startup
processes (approach with offsets to skip gaps while building snapshot).

[1]: https://gist.github.com/michail-nikolaev/e1dfc70bdd7cfd1b902523dbb3db2f28
[2]: https://www.postgresql.org/message-id/flat/CANtu0ogzo4MsR7My9%2BNhu3to5%3Dy7G9zSzUbxfWYOn9W5FfHjTA%40mail.gmail.com#341a3c3b033f69b260120b3173a66382

--
Michail Nikolaev

Attachment Content-Type Size
0001-memory-barrier-instead-of-spinlock.patch text/plain 5.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-11-16 12:47:19 Re: Reducing power consumption on idle servers
Previous Message Alvaro Herrera 2022-11-16 11:44:02 Re: ExecRTCheckPerms() and many prunable partitions