RE: speed up a logical replica setup

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Shubham Khanna' <khannashubham1197(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(at)eisentraut(dot)org>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: RE: speed up a logical replica setup
Date: 2024-01-25 05:54:32
Message-ID: TY3PR01MB9889C1E33C73DC064B3AEA90F57A2@TY3PR01MB9889.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

Based on the requirement, I have profiled the performance test. It showed bottlenecks
are in small-data case are mainly two - starting a server and waiting until the
recovery is done.

# Tested source code

V7 patch set was applied atop HEAD(0eb23285). No configure options were specified
when it was built.

# Tested workload

I focused on only 100MB/1GB cases because bigger ones have already had good performance.
(Number of inserted tuples were same as previous tests)
I used bash script instead of tap test framework. See attached. Executed SQLs and
operations were almost the same.

As you can see, I tested only one-db case. Results may be changed if the number
of databases were changed.

# Measurement
Some debug logs which output current time were added (please see diff file).
I picked up some events and done at before/after them. Below bullets showed the measured ones:

* Starting a server
* Stopping a server
* Creating replication slots
* Creating publications
* Waiting until the recovery ended
* Creating subscriptions

# Result
Below table shows the elapsed time for these events. Raw data is also available
by the attached excel file.

|Event category |100MB case [sec]|1GB [sec]|
|Starting a server |1.414 |1.417 |
|Stoping a server |0.506 |0.506 |
|Creating replication slots |0.005 |0.007 |
|Creating publications |0.001 |0.002 |
|Waiting until the recovery ended|1.603 |14.529 |
|Creating subscriptions |0.012 |0.012 |
|Total |3.541 |16.473 |
|actual time |4.37 |17.271 |

As you can see, starting servers and waiting seem slow. We cannot omit these,
but setting smaller shared_buffers will reduce the start time. One approach is
to overwrite the GUC to smaller value, but I think we cannot determine the
appropriate value.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
run.sh application/octet-stream 2.2 KB
perf_result.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet 17.3 KB
add_debug_log.diff application/octet-stream 8.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-01-25 05:56:49 Re: Synchronizing slots from primary to standby
Previous Message Thomas Munro 2024-01-25 05:44:01 Re: Remove pthread_is_threaded_np() checks in postmaster