Re: Perform streaming logical transactions by background workers and parallel apply

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-12 06:46:11
Message-ID: CAHut+PutJBXuoy3DB5S7X9sZitfZp4qsXfzgkt5C5=CTNtX-Ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v20-0003:

(Sorry - the reviews are time consuming, so I am lagging slightly
behind the latest posted version)

======

1. <General>

1a.
There are a few comment modifications in this patch (e.g. changing
FROM "in an apply background worker" TO "using an apply background
worker"). e.g. I noticed lots of these in worker.c but they might be
in other files too.

Although these are good changes, these are just tweaks to new comments
introduced by patch 0001, so IMO such changes belong in that patch,
not in this one.

1b.
Actually, there are still some comments says "by an apply background
worker///" and some saying "using an apply background worker..." and
some saying "in the apply background worker...". Maybe they are all
OK, but it will be better if all such can be searched and made to have
consistent wording

======

2. Commit message

2a.

Without these restrictions, the following scenario may occur:
The apply background worker lock a row when processing a streaming transaction,
after that the main apply worker tries to lock the same row when processing
another transaction. At this time, the main apply worker waits for the
streaming transaction to complete and the lock to be released, it won't send
subsequent data of the streaming transaction to the apply background worker;
the apply background worker waits to receive the rest of streaming transaction
and can't finish this transaction. Then the main apply worker will wait
indefinitely.

"background worker lock a row" -> "background worker locks a row"

"Then the main apply worker will wait indefinitely." -> really, you
already said the main apply worker is waiting, so I think this
sentence only needs to say: "Now a deadlock has occurred, so both
workers will wait indefinitely."

2b.

Text fragments are all common between:

i. This commit message
ii. Text in pgdocs CREATE SUBSCRIPTION
iii. Function comment for 'logicalrep_rel_mark_parallel_apply' in relation.c

After addressing other review comments please make sure all those 3
parts are worded same.

======

3. doc/src/sgml/ref/create_subscription.sgml

+ There are two requirements for using <literal>parallel</literal>
+ mode: 1) the unique column in the table on the subscriber-side should
+ also be the unique column on the publisher-side; 2) there cannot be
+ any non-immutable functions used by the subscriber-side replicated
+ table.

3a.
I am not sure – is "requirements" the correct word here, or maybe it
should be "prerequisites".

3b.
Is it correct to say "should also be", or should that say "must also be"?

======

4. src/backend/replication/logical/applybgworker.c -
apply_bgworker_relation_check

+ /*
+ * Skip check if not using apply background workers.
+ *
+ * If any worker is handling the streaming transaction, this check needs to
+ * be performed not only in the apply background worker, but also in the
+ * main apply worker. This is because without these restrictions, main
+ * apply worker may block apply background worker, which will cause
+ * infinite waits.
+ */
+ if (!am_apply_bgworker() &&
+ (list_length(ApplyBgworkersFreeList) == list_length(ApplyBgworkersList)))
+ return;

I struggled a bit to reconcile the comment with the condition. Is the
!am_apply_bgworker() part of this even needed – isn't the
list_length() check enough?

~~~

5.

+ /* We are in error mode and should give user correct error. */

I still [1, #3.4a] don't see the value in saying "should give correct
error" (e.g. what's the alternative?).

Maybe instead of that comment it can just say:
rel->parallel_apply = PARALLEL_APPLY_UNSAFE;

======

6. src/backend/replication/logical/proto.c - RelationGetUniqueKeyBitmap

+ /* Add referenced attributes to idindexattrs */
+ for (i = 0; i < indexRel->rd_index->indnatts; i++)
+ {
+ int attrnum = indexRel->rd_index->indkey.values[i];
+
+ /*
+ * We don't include non-key columns into idindexattrs
+ * bitmaps. See RelationGetIndexAttrBitmap.
+ */
+ if (attrnum != 0)
+ {
+ if (i < indexRel->rd_index->indnkeyatts &&
+ !bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber, attunique))
+ attunique = bms_add_member(attunique,
+ attrnum - FirstLowInvalidHeapAttributeNumber);
+ }
+ }

There are 2x comments in that code that are referring to
'idindexattrs' but I think it is a cut/paste problem because that
variable name does not even exist in this copied function.

======

7. src/backend/replication/logical/relation.c -
logicalrep_rel_mark_parallel_apply

+ /* Initialize the flag. */
+ entry->parallel_apply = PARALLEL_APPLY_SAFE;

I have unsuccessfully repeated the same review comment several times
[1 #3.8] suggesting that this flag should not be initialized to SAFE.
IMO the state should remain as UNKNOWN until you are either sure it is
SAFE, or sure it is UNSAFE. Anyway, I'll give up on this point now;
let's see what other people think.

======

8. src/include/replication/logicalrelation.h

+/*
+ * States to determine if changes on one relation can be applied using an
+ * apply background worker.
+ */
+typedef enum ParallelApplySafety
+{
+ PARALLEL_APPLY_UNKNOWN = 0, /* unknown */
+ PARALLEL_APPLY_SAFE, /* Can apply changes using an apply background
+ worker */
+ PARALLEL_APPLY_UNSAFE /* Can not apply changes using an apply
+ background worker */
+} ParallelApplySafety;
+

I think the values are self-explanatory so the comments for every
value add nothing here, particularly since the enum itself has a
comment saying the same thing. I'm not sure if you accidentally missed
my previous comment [1, #3.12b] about this, or just did not agree with
it.

======

9. .../subscription/t/015_stream.pl

+# "streaming = parallel" does not support non-immutable functions, so change
+# the function in the defult expression of column "c".
+$node_subscriber->safe_psql(
+ 'postgres', qq{
+ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT to_timestamp(1284352323);
+ALTER SUBSCRIPTION tap_sub SET(streaming = parallel, binary = off);
+});

9a.
typo "defult"

9b.
The problem with to_timestamp(1284352323) is that it looks like it
must be some special value, but in fact AFAIK you don't care at all
what value timestamp this is. I think it would be better here to just
use to_timestamp(0) or to_timestamp(999) or similar so the number is
obviously not something of importance.

======

10. .../subscription/t/016_stream.pl

+# "streaming = parallel" does not support non-immutable functions, so change
+# the function in the defult expression of column "c".
+$node_subscriber->safe_psql(
+ 'postgres', qq{
+ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT to_timestamp(1284352323);
+ALTER SUBSCRIPTION tap_sub SET(streaming = parallel);
+});

10a. Ditto 9a.
10b. Ditto 9b.

======

11. .../subscription/t/022_twophase_cascade.pl

+# "streaming = parallel" does not support non-immutable functions, so change
+# the function in the defult expression of column "c".
+$node_B->safe_psql(
+ 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
to_timestamp(1284352323);");
+$node_C->safe_psql(
+ 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
to_timestamp(1284352323);");
+

11a. Ditto 9a.
11b. Ditto 9b.

======

12. .../subscription/t/023_twophase_stream.pl

+# "streaming = parallel" does not support non-immutable functions, so change
+# the function in the defult expression of column "c".
+$node_subscriber->safe_psql(
+ 'postgres', qq{
+ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT to_timestamp(1284352323);
+ALTER SUBSCRIPTION tap_sub SET(streaming = parallel);
+});

12a. Ditto 9a.
12b. Ditto 9b.

======

13. .../subscription/t/032_streaming_apply.pl

+# Drop default value on the subscriber, now it works.
+$node_subscriber->safe_psql('postgres',
+ "ALTER TABLE test_tab1 ALTER COLUMN b DROP DEFAULT");

Maybe for these tests like this it would be better to test if it works
OK using an immutable DEFAULT function instead of just completely
removing the bad function to make it work.

I think maybe the same was done for TRIGGER tests. There was a test
for a trigger with a bad function, and then the trigger was removed.
What about including a test for the trigger with a good function?

------
[1] https://www.postgresql.org/message-id/CAHut%2BPv9cKurDQHtk-ygYp45-8LYdE%3D4sMZY-8UmbeDTGgECVg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2022-08-12 06:59:30 Re: fix typos
Previous Message John Naylor 2022-08-12 06:01:25 build remaining Flex files standalone