Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables

From: Keisuke Kuroda <keisuke(dot)kuroda(dot)3862(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication CPU-bound with TRUNCATE/DROP/CREATE many tables
Date: 2020-10-01 09:12:54
Message-ID: CANDwgg+DmVvk_dRmcpG4J+WS2wQmhJ-okikbH2oGF_Byg1_iSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > I test the patch on the master HEAD(9796f455) and it works fine.
> > * make installcheck-world: passed
> > * walsender process continues to use 100% of the CPU 1core when
> > TRUNCATE 1000 partition: 1s or less
> > ** before patch : 230s
>
> Does this result indicate that it is still CPU bound but it does the
> actual decoding and completes in 1s instead of spending 230s mainly to
> execute unnecessary invalidations?

Okay, to check the decodeing and invalidation times,
I got the time to return the results to pg_recvlogical and perf.

Before the patch, most of the processing was done
by hash_seq_search in the ReferenodeMapInvalidateCallback.
After the patch, this problem has been resolved.

# test

(1) start pg_recvlogical
(2) INSERT to 2000 partition for creating RelfilenodemapHash
(3) TRUNCATE 1000 partition

(1)
pg_recvlogical --create-slot --start -f - --if-not-exists \
--plugin=test_decoding --slot=test1 --dbname=postgres --username=postgres \
--option=include-timestamp | gawk '{ print strftime("%Y-%m-%d
%H:%M:%S"), $0; fflush(); }'

(2)
DO $$
DECLARE
i INT;
j INT;
schema TEXT;
tablename TEXT;
BEGIN
FOR i IN 1 .. 2 LOOP
schema := 'nsp_' || to_char(i, 'FM000');
tablename := 'tbl_' || to_char(i, 'FM000');
EXECUTE 'INSERT INTO ' || schema || '.' || tablename || '
SELECT i,i,i,0,0,0,0,0,0,0,0,0,''a''::bytea,''a''::bytea FROM
generate_series(0,9999,1) AS i';
END LOOP;
END;
$$ LANGUAGE plpgsql;

(3)
TRUNCATE TABLE nsp_001.tbl_001;

# before

(3) TRUNCATE
decode + invalidation = 222s

2020-10-01 16:58:29 BEGIN 529
2020-10-01 17:02:11 COMMIT 529 (at 2020-10-01 16:58:29.281747+09)

(Before the patch was applied, I was concerned that
the "table nsp_001.tbl_001 ... TRUNCATE" was not outputting.)

Samples: 228K of event 'cpu-clock:uhH', Event count (approx.): 57080750000
Overhead Command Shared Object Symbol
- 96.00% postgres postgres [.] hash_seq_search
hash_seq_search
- RelfilenodeMapInvalidateCallback
- LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
+ 0.77% postgres postgres [.] LocalExecuteInvalidationMessage
0.47% postgres postgres [.] hash_bytes_uint32
0.42% postgres postgres [.] hash_search_with_hash_value
0.37% postgres postgres [.] RelfilenodeMapInvalidateCallback
0.32% postgres postgres [.] CatCacheInvalidate
0.23% postgres postgres [.] PlanCacheRelCallback
0.23% postgres postgres [.] ReorderBufferExecuteInvalidations

# after

(3) TRUNCATE
decode + invalidation = 1s or less

2020-10-01 17:09:43 BEGIN 537
2020-10-01 17:09:43 table nsp_001.tbl_001, nsp_001.part_0001,
nsp_001.part_0002 ... nsp_001.part_0999, nsp_001.part_1000: TRUNCATE:
(no-flags)
2020-10-01 17:09:43 COMMIT 537 (at 2020-10-01 17:09:43.192989+09)

Samples: 337 of event 'cpu-clock:uhH', Event count (approx.): 84250000
Overhead Command Shared Object Symbol
- 47.48% postgres postgres [.] hash_seq_search
hash_seq_search
RelfilenodeMapInvalidateCallback
LocalExecuteInvalidationMessage
ReorderBufferExecuteInvalidations
ReorderBufferProcessTXN
ReorderBufferCommit
DecodeCommit
DecodeXactOp
LogicalDecodingProcessRecord
XLogSendLogical
WalSndLoop
StartLogicalReplication
exec_replication_command
PostgresMain
BackendRun
BackendStartup
ServerLoop
PostmasterMain
main
__libc_start_main
_start
+ 3.26% postgres postgres [.] heap_hot_search_buffer
+ 2.67% postgres postgres [.] pg_comp_crc32c_sse42
+ 2.08% postgres libc-2.17.so [.] __memcpy_ssse3_back
+ 2.08% postgres postgres [.] DecodeXLogRecord
+ 2.08% postgres postgres [.] hash_search_with_hash_value
+ 1.48% postgres libpthread-2.17.so [.] __libc_recv
+ 1.19% postgres libc-2.17.so [.] __memset_sse2
+ 1.19% postgres libpthread-2.17.so [.] __errno_location
+ 1.19% postgres postgres [.] LocalExecuteInvalidationMessage
+ 1.19% postgres postgres [.] ReadPageInternal
+ 1.19% postgres postgres [.] XLogReadRecord
+ 1.19% postgres postgres [.] socket_is_send_pending
hash_seq_search

Best Regards,
--
Keisuke Kuroda
NTT Software Innovation Center
keisuke(dot)kuroda(dot)3862(at)gmail(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2020-10-01 09:18:30 Re: Add information to rm_redo_error_callback()
Previous Message osumi.takamichi@fujitsu.com 2020-10-01 08:14:42 RE: Disable WAL logging to speed up data loading