From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Subject: | RE: walsender performance regression due to logical decoding on standby changes |
Date: | 2023-05-11 09:42:39 |
Message-ID: | OS0PR01MB57165FA7A7B24BEF34BAE75194749@OS0PR01MB5716.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, May 10, 2023 3:03 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> Unfortunately I have found the following commit to have caused a
> performance
> regression:
>
> commit e101dfac3a53c20bfbf1ca85d30a368c2954facf
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: 2023-04-08 00:24:24 -0700
>
> For cascading replication, wake physical and logical walsenders separately
>
> Physical walsenders can't send data until it's been flushed; logical
> walsenders can't decode and send data until it's been applied. On the
> standby, the WAL is flushed first, which will only wake up physical
> walsenders; and then applied, which will only wake up logical
> walsenders.
>
> Previously, all walsenders were awakened when the WAL was flushed. That
> was fine for logical walsenders on the primary; but on the standby the
> flushed WAL would have been not applied yet, so logical walsenders were
> awakened too early.
>
> Per idea from Jeff Davis and Amit Kapila.
>
> Author: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
> Reviewed-By: Jeff Davis <pgsql(at)j-davis(dot)com>
> Reviewed-By: Robert Haas <robertmhaas(at)gmail(dot)com>
> Reviewed-by: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> Reviewed-by: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
> Discussion:
> https://postgr.es/m/CAA4eK1+zO5LUeisabX10c81LU-fWMKO4M9Wyg1cdkb
> W7Hqh6vQ(at)mail(dot)gmail(dot)com
>
> The problem is that, on a standby, after the change - as needed to for the
> approach to work - the call to WalSndWakeup() in ApplyWalRecord() happens
> for
> every record, instead of only happening when the timeline is changed (or WAL
> is flushed or ...).
>
> WalSndWakeup() iterates over all walsender slots, regardless of whether in
> use. For each of the walsender slots it acquires a spinlock.
>
> When replaying a lot of small-ish WAL records I found the startup process to
> spend the majority of the time in WalSndWakeup(). I've not measured it very
> precisely yet, but the overhead is significant (~35% slowdown), even with the
> default max_wal_senders. If that's increased substantially, it obviously gets
> worse.
I did some simple tests for this to see the performance impact on
the streaming replication, just share it here for reference.
1) sync primary-standby setup, load data on primary and count the time spent on
replication. the degradation will be more obvious as the value of max_wal_senders
increases.
max_wal_senders before(ms) after(ms) degradation
100 13394.4013 14141.2615 5.58%
200 13280.8507 14597.1173 9.91%
300 13535.0232 16735.7379 23.65%
2) Similar as 1) but count the time that the standby startup process spent on
replaying WAL(via gprof).
10 senders
===========
before
% cumulative self self total
time seconds seconds calls s/call s/call name
4.12 0.45 0.11 1 0.11 2.46 PerformWalRecovery
after
% cumulative self self total
time seconds seconds calls s/call s/call name
17.99 0.59 0.59 4027383 0.00 0.00 WalSndWakeup
8.23 0.86 0.27 1 0.27 3.11 PerformWalRecovery
100 senders
===========
before
% cumulative self self total
time seconds seconds calls s/call s/call name
5.56 0.36 0.18 1 0.18 2.91 PerformWalRecovery
after
% cumulative self self total
time seconds seconds calls s/call s/call name
64.65 4.39 4.39 4027383 0.00 0.00 WalSndWakeup
2.95 4.59 0.20 1 0.20 6.62 PerformWalRecovery
Will test after applying the latest patch in this thread later.
Best Regards,
Hou zj
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2023-05-11 09:57:37 | Re: Redundant strlen(query) in get_rel_infos |
Previous Message | Peter Eisentraut | 2023-05-11 09:37:00 | Re: smgrzeroextend clarification |