Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, "Wei Wang (Fujitsu)" <wangw(dot)fnst(at)fujitsu(dot)com>, "Yu Shi (Fujitsu)" <shiy(dot)fnst(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication
Date: 2023-08-11 12:24:03
Message-ID: CALDaNm0Kc2Y-GAstv2SRXWZRfd0bYxqViDRx4baCURkmego1cA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 11 Aug 2023 at 16:26, vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 9 Aug 2023 at 09:51, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > Hi Melih,
> >
> > Here is a patch to help in getting the execution at various phases
> > like: a) replication slot creation time, b) Wal reading c) Number of
> > WAL records read d) subscription relation state change etc
> > Couple of observation while we tested with this patch:
> > 1) We noticed that the patch takes more time for finding the decoding
> > start point.
> > 2) Another observation was that the number of XLOG records read for
> > identify the consistent point was significantly high with the v26_0001
> > patch.
> >
> > HEAD
> > postgres=# select avg(counttime)/1000 "avgtime(ms)",
> > median(counttime)/1000 "median(ms)", min(counttime)/1000
> > "mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test
> > group by logtype;
> > avgtime(ms) | median(ms) | mintime(ms) |
> > maxtime(ms) | logtype
> > ------------------------+------------------------+-------------+-------------+--------------------------
> > 0.00579245283018867920 | 0.00200000000000000000 | 0 |
> > 1 | SNAPSHOT_BUILD
> > 1.2246811320754717 | 0.98550000000000000000 | 0 |
> > 37 | LOGICAL_SLOT_CREATION
> > 171.0863283018867920 | 183.9120000000000000 | 0 |
> > 408 | FIND_DECODING_STARTPOINT
> > 2.0699433962264151 | 1.4380000000000000 | 1 |
> > 49 | INIT_DECODING_CONTEXT
> > (4 rows)
> >
> > HEAD + v26-0001 patch
> > postgres=# select avg(counttime)/1000 "avgtime(ms)",
> > median(counttime)/1000 "median(ms)", min(counttime)/1000
> > "mintime(ms)", max(counttime)/1000 "maxtime(ms)", logtype from test
> > group by logtype;
> > avgtime(ms) | median(ms) | mintime(ms) |
> > maxtime(ms) | logtype
> > ------------------------+------------------------+-------------+-------------+--------------------------
> > 0.00588113207547169810 | 0.00500000000000000000 | 0 |
> > 0 | SNAPSHOT_BUILD
> > 1.1270962264150943 | 1.1000000000000000 | 0 |
> > 2 | LOGICAL_SLOT_CREATION
> > 301.1745528301886790 | 410.4870000000000000 | 0 |
> > 427 | FIND_DECODING_STARTPOINT
> > 1.4814660377358491 | 1.4530000000000000 | 1 |
> > 9 | INIT_DECODING_CONTEXT
> > (4 rows)
> >
> > In the above FIND_DECODING_STARTPOINT is very much higher with V26-0001 patch.
> >
> > HEAD
> > FIND_DECODING_XLOG_RECORD_COUNT
> > - average = 2762
> > - median = 3362
> >
> > HEAD + reuse worker patch(v26_0001 patch)
> > Where FIND_DECODING_XLOG_RECORD_COUNT
> > - average = 4105
> > - median = 5345
> >
> > Similarly Number of xlog records read is higher with v26_0001 patch.
> >
> > Steps to calculate the timing:
> > -- first collect the necessary LOG from subscriber's log.
> > cat *.log | grep -E
> > '(LOGICAL_SLOT_CREATION|INIT_DECODING_CONTEXT|FIND_DECODING_STARTPOINT|SNAPSHOT_BUILD|FIND_DECODING_XLOG_RECORD_COUNT|LOGICAL_XLOG_READ|LOGICAL_DECODE_PROCESS_RECORD|LOGICAL_WAIT_TRANSACTION)'
> > > grep.dat
> >
> > create table testv26(logtime varchar, pid varchar, level varchar,
> > space varchar, logtype varchar, counttime int);
> > -- then copy these datas into db table to count the avg number.
> > COPY testv26 FROM '/home/logs/grep.dat' DELIMITER ' ';
> >
> > -- Finally, use the SQL to analyze the data:
> > select avg(counttime)/1000 "avgtime(ms)", logtype from testv26 group by logtype;
> >
> > --- To get the number of xlog records read:
> > select avg(counttime) from testv26 where logtype
> > ='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1;
> >
> > Thanks to Peter and Hou-san who helped in finding these out. We are
> > parallely analysing this, @Melih Mutlu posting this information so
> > that it might help you too in analysing this issue.
>
> I analysed further on why it needs to read a larger number of XLOG
> records in some cases while creating the replication slot, here are my
> thoughts:
> Note: Tablesync worker needs to connect to the publisher and create
> consistent point for the slots by reading the XLOG records. This
> requires that all the open transactions and the transactions that are
> created while creating consistent point should be committed.
> I feel the creation of slots is better in few cases in Head because:
> Publisher | Subscriber
> ------------------------------------------------------------
> Begin txn1 transaction |
> Insert 1..1000 records |
> Commit |
> Begin txn2 transaction |
> Insert 1..1000 records | Apply worker applies transaction txn1
> | Start tablesync table t2
> | create consistent point in
> | publisher before transaction txn3 is
> | started
> commit | We just need to wait till
> | transaction txn2 is finished.
> Begin txn3 transaction |
> Insert 1..1000 records |
> commit |
>
> In V26, this is happening in some cases:
> Publisher | Subscriber
> ------------------------------------------------------------
> Begin txn1 transaction |
> Insert 1..1000 records |
> Commit |
> Begin txn2 transaction |
> Insert 1..1000 records | Apply worker applies transaction txn1
> | Start tablesync table t2
> commit | Create consistent point
> Begin txn3 transaction | (since transaction txn2 is committed
> | and txn3 is started, we will
> | need to wait
> | for transaction txn3 to be
> | committed)
> Insert 1..1000 records |
> commit |
>
> This is because In HEAD the tablesync worker will be started after one
> commit, so we are able to create the consistent point before a new
> transaction is started in some cases.
> Create slot will be fastest if the tablesync worker is able to connect
> to the publisher and create a consistent point before the new
> transaction is started. The probability of this is better in HEAD for
> this scenario as the new tablesync worker is started after commit and
> the tablesync worker in HEAD has a better time window(because the
> current transaction has just started) before another new transaction
> is started. This probability is slightly lower with the V26 version.
> I felt this issue is purely a timing issue in a few cases because of
> the timing of the new transactions being created while creating the
> slot.

I used the following steps to analyse this issue:
Logs can be captured by applying the patches at [1].

-- first collect the necessary information about from publisher's log
from the execution of HEAD:
cat *.log | grep FIND_DECODING_XLOG_RECORD_COUNT > grep_head.dat

-- first collect the necessary information about from publisher's log
from the execution of v26:
cat *.log | grep FIND_DECODING_XLOG_RECORD_COUNT > grep_v26.dat

-- then copy these datas into HEAD's db table to count the avg number.
COPY test_head FROM '/home/logs/grep_head.dat' DELIMITER ' ';

-- then copy these datas into the v26 db table to count the avg number.
COPY test_v26 FROM '/home/logs/grep_v26.dat' DELIMITER ' ';

Find the average of XLOG records read in HEAD:
postgres=# select avg(counttime) from test_head where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1;
avg
-----------------------
1394.1100000000000000
(1 row)

Find the average of XLOG records read in V26:
postgres=# select avg(counttime) from test_v26 where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime != 1;
avg
-----------------------
1900.4100000000000000
(1 row)

When analysing why create replication slot needs to read more records
in a few cases, I found a very interesting observation. I found that
with HEAD about 29% (29 out of 100 tables) of tables could find the
consistent point by reading the WAL records up to the next subsequent
COMMIT, whereas with V26 patch only 5% of tables could find the
consistent point by reading the WAL records up to next subsequent
commit. In these cases V26 patch had to read another transaction of
approximately > 1000 WAL records to reach the consistent point which
results in an increase of average for more records to be read with V26
version. For these I got the start lsn and consistent lsn from the log
files by matching the corresponding FIND_DECODING_XLOG_RECORD_COUNT, I
did a waldump of the WAL file and searched the records between start
lsn and consistent LSN in the WAL dump and confirmed that only one
COMMIT record had to be read to reach the consistent point. Details of
this information from the log of HEAD and V26 is attached.

The number of tables required to read less than 1 commit can be found
by the following:
-- I checked for 1000 WAL records because we are having 1000 inserts
in each transaction.
select count(counttime) from test_head where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime < 1000;
count
-------
29
(1 row)

select count(counttime) from test_v26 where logtype
='FIND_DECODING_XLOG_RECORD_COUNT' and counttime < 1000;
count
-------
5
(1 row)

Apart from these there were other instances where the V26 had to read
more COMMIT record in few cases.
The above is happening because as mentioned in [2]. i.e. in HEAD the
tablesync worker will be started after one commit, so we are able to
create the consistent point before a new transaction is started in
some cases. Create slot will be fastest if the tablesync worker is
able to connect to the publisher and create a consistent point before
the new transaction is started. The probability of this is better in
HEAD for this scenario as the new tablesync worker is started after
commit and the tablesync worker in HEAD has a better time
window(because the current transaction has just started) before
another new transaction is started. This probability is slightly
lower with the V26 version. I felt this issue is purely a timing issue
in a few cases because of the timing of the new transactions being
created while creating the slot.
Since this is purely a timing issue as explained above in a few cases
because of the timing of the new transactions being created while
creating the slot, I felt we can ignore this.

[1] - https://www.postgresql.org/message-id/CALDaNm1TA068E2niJFUR9ig%2BYz3-ank%3Dj5%3Dj-2UocbzaDnQPrA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm2k2z3Hpa3Omb_tpxWkyHnUvsWjJMbqDs-2uD2eLzemJQ%40mail.gmail.com

Regards,
Vignesh

Attachment Content-Type Size
log_information_of_consistent_point_1_commit.txt text/plain 10.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2023-08-11 12:29:37 AssertLog instead of Assert in some places
Previous Message Merlin Moncure 2023-08-11 12:05:17 Re: Let's make PostgreSQL multi-threaded