Re: Perform streaming logical transactions by background workers and parallel apply

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2023-01-24 07:18:35
Message-ID: CAHut+Pvw0eBGu+pe9jRBdC+29MO5Fep4CTf92muhV5Ji+P9tcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v86-0002

======
Commit message

1.
Use the use the existing developer option logical_replication_mode to test the
parallel apply of large transaction on subscriber.

~

Typo “Use the use the”

SUGGESTION (rewritten)
Give additional functionality to the existing developer option
'logical_replication_mode' to help test parallel apply of large
transactions on the subscriber.

~~~

2.
Maybe that commit message should also say extra TAP tests that have
been added to exercise the serialization part of the parallel apply?

BTW – I can see the TAP tests are testing full serialization (when the
GUC is 'immediate') but I not sure how is "partial" serialization
(when it has to switch halfway from shmem to files) being tested.

======
doc/src/sgml/config.sgml

3.
Allows streaming or serializing changes immediately in logical
decoding. The allowed values of logical_replication_mode are buffered
and immediate. When set to immediate, stream each change if streaming
option (see optional parameters set by CREATE SUBSCRIPTION) is
enabled, otherwise, serialize each change. When set to buffered, which
is the default, decoding will stream or serialize changes when
logical_decoding_work_mem is reached.
On the subscriber side, if streaming option is set to parallel, this
parameter also allows the leader apply worker to send changes to the
shared memory queue or to serialize changes. When set to buffered, the
leader sends changes to parallel apply workers via shared memory
queue. When set to immediate, the leader serializes all changes to
files and notifies the parallel apply workers to read and apply them
at the end of the transaction.

~

Because now this same developer GUC affects both the publisher side
and the subscriber side differently IMO this whole description should
be re-structured accordingly.

SUGGESTION (something like)

The allowed values of logical_replication_mode are buffered and
immediate. The default is buffered.

On the publisher side, ...

On the subscriber side, ...

~~~

4.
This parameter is intended to be used to test logical decoding and
replication of large transactions for which otherwise we need to
generate the changes till logical_decoding_work_mem is reached.

~

Maybe this paragraph needs rewording or moving. e.g. Isn't that
misleading now? Although this might be an explanation for the
publisher side, it does not seem relevant to the subscriber side's
behaviour.

======
.../replication/logical/applyparallelworker.c

5.
@ -1149,6 +1149,9 @@ pa_send_data(ParallelApplyWorkerInfo *winfo, Size
nbytes, const void *data)
Assert(!IsTransactionState());
Assert(!winfo->serialize_changes);

+ if (logical_replication_mode == LOGICAL_REP_MODE_IMMEDIATE)
+ return false;
+

I felt that code should have some comment, even if it is just
something quite basic like "/* For developer testing */"

======
.../t/018_stream_subxact_abort.pl

6.
+# Clean up test data from the environment.
+$node_publisher->safe_psql('postgres', "TRUNCATE TABLE test_tab_2");
+$node_publisher->wait_for_catchup($appname);

Is it necessary to TRUNCATE the table here? If everything is working
shouldn't the data be rolled back anyway?

~~~

7.
+$node_publisher->safe_psql(
+ 'postgres', q{
+ BEGIN;
+ INSERT INTO test_tab_2 values(1);
+ SAVEPOINT sp;
+ INSERT INTO test_tab_2 values(1);
+ ROLLBACK TO sp;
+ COMMIT;
+ });

Perhaps this should insert 2 different values so then the verification
code can check the correct value remains instead of just checking
COUNT(*)?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Egor Rogov 2023-01-24 07:35:59 Re: pg_stats and range statistics
Previous Message Peter Smith 2023-01-24 07:13:50 Re: Time delayed LR (WAS Re: logical replication restrictions)