RE: Synchronizing slots from primary to standby

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2024-03-13 07:58:52
Message-ID: OS0PR01MB571671A91E602BEFD3083EEE942A2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, March 8, 2024 1:09 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> On Fri, Mar 8, 2024 at 9:56 AM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
> >
> >> Pushed with minor modifications. I'll keep an eye on BF.
> >>
> >> BTW, one thing that we should try to evaluate a bit more is the
> >> traversal of slots in StandbySlotsHaveCaughtup() where we verify if
> >> all the slots mentioned in standby_slot_names have received the
> >> required WAL. Even if the standby_slot_names list is short the total
> >> number of slots can be much larger which can lead to an increase in
> >> CPU usage during traversal. There is an optimization that allows to
> >> cache ss_oldest_flush_lsn and ensures that we don't need to traverse
> >> the slots each time so it may not hit frequently but still there is a
> >> chance. I see it is possible to further optimize this area by caching
> >> the position of each slot mentioned in standby_slot_names in
> >> replication_slots array but not sure whether it is worth.
> >>
> >>
> >
> > I tried to test this by configuring a large number of logical slots while making
> sure the standby slots are at the end of the array and checking if there was any
> performance hit in logical replication from these searches.
> >
>

Thanks Nisha for conducting some additional tests and discussing with me
internally. We have collected the performance data on HEAD. Basically, we don't
see a noticeable difference in the performance data and StandbySlotsHaveCaughtup
also does not stand out in the profile.

Here are the details:

> 1) Redoing XLogSendLogical time-log related test with
> 'sync_replication_slots' enabled.

Setup:
- one primary + 3standbys + one subscriber with one active subscription
- ran 15 min pgbench for all cases
- hot_standby_feedback=ON and sync_replication_slots=TRUE

(To maximize the impact of SearchNamedReplicationSlot clear, the standby slot
is at the end of the ReplicationSlotCtl->replication_slots array in each test)

Case1 - 1 slot: 895.305565 secs
Case2 - 100 slots: 894.936039 secs
Case3 - 500 slots: 895.256412 secs

> 2) pg_recvlogical test to monitor lag in StandbySlotsHaveCaughtup() for a
> large number of slots.

We reran the XLogSendLogical() wait time analysis tests.
Setup:
- One primary node and 3 standby nodes
- Created logical slots using "test_decoding" and activated one walsender by running pg_recvlogical on one slot.
- hot_standby_feedback=ON and sync_replication_slots=TRUE
- Did one run for each case with pgbench for 15 min

(To maximize the impact of SearchNamedReplicationSlot clear, the stanbys slot
is at the end of the ReplicationSlotCtl->replication_slots array in each test)

Case1 - 1 slot: 894.83775 secs
Case2 - 100 slots: 894.449356 secs
Case3 - 500 slots: 894.98479 secs

There is no noticeable regression when the number of replication slots increases.

> 3) Profiling to see if StandbySlotsHaveCaughtup() is noticeable in the report
> when there are a large number of slots to traverse.

The setup is the same as 2). To maximize the impact of
SearchNamedReplicationSlot clear, the stanbys slot is at the end of the
ReplicationSlotCtl->replication_slots array.

The StandbySlotsHaveCaughtup is not noticeable in the profile.

0.03% 0.00% postgres postgres [.] StandbySlotsHaveCaughtup

After some investigation, it appears that the cached 'ss_oldest_flush_lsn'
plays a crucial role in optimizing this workload, effectively reducing the need
for frequent strcmp operations within the loop.

To test the impact of frequent strcmp calls, we conducted a test by removing
the 'ss_oldest_flush_lsn' check and re-evaluating the profile. This time, although the
profile indicated a small increase in the StandbySlotsHaveCaughtup metric,
it still does not raise significant concerns.

--1.47%--NeedToWaitForWal
| NeedToWaitForStandbys
| StandbySlotsHaveCaughtup
| |
| --0.96%--SearchNamedReplicationSlot

The scripts that were used to setup the test environment for all above tests are attached.
The machine configuration for above tests is as follows:
CPU : E7-4890v2(2.8Ghz/15core)×4
MEM : 768GB
HDD : 600GB×2
OS : RHEL 7.9

While no noticeable overhead was observed in the SearchNamedReplicationSlot
operation, we explored a strategy to enhance efficiency by minimizing the
search for standby slots within the loop. The idea is to cache the
position of each standby slot within ReplicationSlotCtl->replication_slots. We
will reference the slot directly through
ReplicationSlotCtl->replication_slots[index]. If the slot name matches, we will
perform other checks including the restart_lsn; otherwise,
SearchNamedReplicationSlot is invoked to update the index cache accordingly.
This optimization can reduce the cost from O(n*m) to O(n).

Note that since we didn't see the overhead in the test, I am not proposing to
push this patch now. But just share the idea and a small patch in case anyone
came across a workload where performance impact of SearchNamedReplicationSlot
becomes noticeable.

Best Regards,
Hou zj

Attachment Content-Type Size
0001-Cache-standby-slot-index.patch.txt text/plain 3.9 KB
test_3.zip application/x-zip-compressed 139.0 KB
test_2.zip application/x-zip-compressed 2.7 KB
test_1.zip application/x-zip-compressed 2.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-03-13 08:00:46 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Hayato Kuroda (Fujitsu) 2024-03-13 07:47:52 RE: speed up a logical replica setup