RE: Conflict detection for update_deleted in logical replication

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Nisha Moond <nisha(dot)moond412(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: RE: Conflict detection for update_deleted in logical replication
Date: 2025-01-20 06:53:39
Message-ID: OSCPR01MB14966F39BE1732B9E433023BFF5E72@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear hackers,

I've created a new script which simulates that user reduce the workload on the
publisher side. Attached zip file contains a script, execution log and pgbench
outputs. Experiments were done with v24 patch set.

Abstract
======

In this test the conflict slot could be invalidated as expected when the workload
on the publisher was high, and it would not get invalidated anymore after reducing
the workload. This shows even if the slot has been invalidated once, users can
continue to detect the update_deleted conflict by reduce the workload on the publisher.
Also, the transaction per second on the subscriber side can be mostly same as
retain_conflict_info = off case after reducing workload on the pub.

Workload
========
v23_measure.sh is a script I used. It is bit complex but mostly done below things:

1. Construct a pub-sub replication system.
2. Run a pgbench (tcp-b like workload) on both nodes. Initially the parallelism
of pgbench is 30 on both nodes. While running the benchmark TPS has been replicated
once per 1 second.
3. Check the status of the conflict slot periodically.
4. If the conflict slot is invalidated, stop the pgbench for both nodes.
5. Disable the retain_conflict_info option and wait until the conflict slot is dropped.
6. Wait until all the changes on the publisher is replicated to the subscriber.
7. Enable the retain_conflict_info and wait until the conflict slot is created.
8. Re-run the pgbench on both nodes. At that time, the parallelism for the publisher
side is cut in half.
9. loop step 3-8 until the total benchmark time becomes 900s.

Parameters
==========

Publisher GUCs:
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
wal_level = logical

Subscriber GUCs:

autovacuum_naptime = '30s'
shared_buffers = '30GB'
max_wal_size = 20GB
min_wal_size = 10GB
track_commit_timestamp = on

max_conflict_retention_duration is varied twice, 60s and 120s.

Results for max_conflict_retention_duration = 60s
================================

Parallelism of the publisher side is reduced till 30->15->7->3 and finally the
conflict slot is not invalidated. Below tables show 1) parallelism of the bgbench run,
2) time period for the parallelism, and 3) observed TPS of each iterations.

Publisher side
nclients Ran duration (s) TPS
30 80 34587.9
15 83 19148.2
7 87 9609.1
3 647 4120.7

subscriber side
nclients Ran duration (s) TPS
30 80 10688
30 83 10834
30 87 12327.5
30 647 33300.1

For 30/15/7 cases, the conflict slot has been invalidated around 80s later, but
it can survive for parallelism = 3. At that time the TPS on the subscriber side
becomes mostly same as the publisher (nclients=30).

Results for max_conflict_retention_duration = 120s
=================================

The trend was mostly same as 60s case.

Publisher side
nclients Ran duration TPS
30 155 28979.3
15 157 19333.9
7 196 9875.2
3 389 4539

subscriber side
nclients Ran duration TPS
30 155 5925
30 157 6912
30 196 9157.1
30 389 35736.6

Noticed
=====

While creating the script, I found that step 6 (Wait until all the changes on the
publisher is replicated to the subscriber) was necessary. If it was skipped,
the slot would be invalidated soon. This is because the remained changes are not
replicated to the subscriber side yet and the catchup is delayed due to them.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
resulsts.zip application/x-zip-compressed 63.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shubham Khanna 2025-01-20 07:00:26 Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.
Previous Message Shubham Khanna 2025-01-20 06:49:41 Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.