Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-08-09 04:21:03
Message-ID: CALDaNm1TA068E2niJFUR9ig+Yz3-ank=j5=j-2UocbzaDnQPrA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Melih,

Here is a patch to help in getting the execution at various phases
like: a) replication slot creation time, b) Wal reading c) Number of
WAL records read d) subscription relation state change etc
Couple of observation while we tested with this patch:
1) We noticed that the patch takes more time for finding the decoding
start point.
2) Another observation was that the number of XLOG records read for
identify the consistent point was significantly high with the v26_0001
patch.

HEAD
postgres=# select avg(counttime)/1000 "avgtime(ms)",
median(counttime)/1000 "median(ms)", min(counttime)/1000
"mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test
group by logtype;
avgtime(ms) | median(ms) | mintime(ms) |
maxtime(ms) | logtype
------------------------+------------------------+-------------+-------------+--------------------------
0.00579245283018867920 | 0.00200000000000000000 | 0 |
1 | SNAPSHOT_BUILD
1.2246811320754717 | 0.98550000000000000000 | 0 |
37 | LOGICAL_SLOT_CREATION
171.0863283018867920 | 183.9120000000000000 | 0 |
408 | FIND_DECODING_STARTPOINT
2.0699433962264151 | 1.4380000000000000 | 1 |
49 | INIT_DECODING_CONTEXT
(4 rows)

HEAD + v26-0001 patch
postgres=# select avg(counttime)/1000 "avgtime(ms)",
median(counttime)/1000 "median(ms)", min(counttime)/1000
"mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test
group by logtype;
avgtime(ms) | median(ms) | mintime(ms) |
maxtime(ms) | logtype
------------------------+------------------------+-------------+-------------+--------------------------
0.00588113207547169810 | 0.00500000000000000000 | 0 |
0 | SNAPSHOT_BUILD
1.1270962264150943 | 1.1000000000000000 | 0 |
2 | LOGICAL_SLOT_CREATION
301.1745528301886790 | 410.4870000000000000 | 0 |
427 | FIND_DECODING_STARTPOINT
1.4814660377358491 | 1.4530000000000000 | 1 |
9 | INIT_DECODING_CONTEXT
(4 rows)

In the above FIND_DECODING_STARTPOINT is very much higher with V26-0001 patch.

HEAD
FIND_DECODING_XLOG_RECORD_COUNT
- average = 2762
- median = 3362

HEAD + reuse worker patch(v26_0001 patch)
Where FIND_DECODING_XLOG_RECORD_COUNT
- average = 4105
- median = 5345

Similarly Number of xlog records read is higher with v26_0001 patch.

Steps to calculate the timing:
-- first collect the necessary LOG from subscriber's log.
cat *.log | grep -E
'(LOGICAL_SLOT_CREATION|INIT_DECODING_CONTEXT|FIND_DECODING_STARTPOINT|SNAPSHOT_BUILD|FIND_DECODING_XLOG_RECORD_COUNT|LOGICAL_XLOG_READ|LOGICAL_DECODE_PROCESS_RECORD|LOGICAL_WAIT_TRANSACTION)'
> grep.dat

create table testv26(logtime varchar, pid varchar, level varchar,
space varchar, logtype varchar, counttime int);
-- then copy these datas into db table to count the avg number.
COPY testv26 FROM '/home/logs/grep.dat' DELIMITER ' ';

-- Finally, use the SQL to analyze the data:
select avg(counttime)/1000 "avgtime(ms)", logtype from testv26 group by logtype;

--- To get the number of xlog records read:
select avg(counttime) from testv26 where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1;

Thanks to Peter and Hou-san who helped in finding these out. We are
parallely analysing this, @Melih Mutlu posting this information so
that it might help you too in analysing this issue.

Regards,
Vignesh

Attachment Content-Type Size
v1-0001-count-state-change-time.patch text/x-patch 3.2 KB
v1-0002-Logs-to-measure-creation-of-replication-slot-brea.patch text/x-patch 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-08-09 04:41:58 Re: Reducing memory consumed by RestrictInfo list translations in partitionwise join planning
Previous Message Amit Kapila 2023-08-09 04:15:15 Re: [PoC] pg_upgrade: allow to upgrade publisher node