RE: Perform streaming logical transactions by background workers and parallel apply

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-16 07:33:04
Message-ID: OS3PR01MB6275739E73E8BEC5D13FB6739E6B9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, August 12, 2022 12:46 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Here are some review comments for v20-0003:
>
> (Sorry - the reviews are time consuming, so I am lagging slightly
> behind the latest posted version)

Thanks for your comments.

> 1. <General>
>
> 1a.
> There are a few comment modifications in this patch (e.g. changing
> FROM "in an apply background worker" TO "using an apply background
> worker"). e.g. I noticed lots of these in worker.c but they might be
> in other files too.
>
> Although these are good changes, these are just tweaks to new comments
> introduced by patch 0001, so IMO such changes belong in that patch,
> not in this one.
>
> 1b.
> Actually, there are still some comments says "by an apply background
> worker///" and some saying "using an apply background worker..." and
> some saying "in the apply background worker...". Maybe they are all
> OK, but it will be better if all such can be searched and made to have
> consistent wording

Improved.

> 2. Commit message
>
> 2a.
>
> Without these restrictions, the following scenario may occur:
> The apply background worker lock a row when processing a streaming
> transaction,
> after that the main apply worker tries to lock the same row when processing
> another transaction. At this time, the main apply worker waits for the
> streaming transaction to complete and the lock to be released, it won't send
> subsequent data of the streaming transaction to the apply background worker;
> the apply background worker waits to receive the rest of streaming transaction
> and can't finish this transaction. Then the main apply worker will wait
> indefinitely.
>
> "background worker lock a row" -> "background worker locks a row"
>
> "Then the main apply worker will wait indefinitely." -> really, you
> already said the main apply worker is waiting, so I think this
> sentence only needs to say: "Now a deadlock has occurred, so both
> workers will wait indefinitely."
>
> 2b.
>
> Text fragments are all common between:
>
> i. This commit message
> ii. Text in pgdocs CREATE SUBSCRIPTION
> iii. Function comment for 'logicalrep_rel_mark_parallel_apply' in relation.c
>
> After addressing other review comments please make sure all those 3
> parts are worded same.

Improved.

> 3. doc/src/sgml/ref/create_subscription.sgml
>
> + There are two requirements for using <literal>parallel</literal>
> + mode: 1) the unique column in the table on the subscriber-side should
> + also be the unique column on the publisher-side; 2) there cannot be
> + any non-immutable functions used by the subscriber-side replicated
> + table.
>
> 3a.
> I am not sure – is "requirements" the correct word here, or maybe it
> should be "prerequisites".
>
> 3b.
> Is it correct to say "should also be", or should that say "must also be"?

Improved.

> 4. src/backend/replication/logical/applybgworker.c -
> apply_bgworker_relation_check
>
> + /*
> + * Skip check if not using apply background workers.
> + *
> + * If any worker is handling the streaming transaction, this check needs to
> + * be performed not only in the apply background worker, but also in the
> + * main apply worker. This is because without these restrictions, main
> + * apply worker may block apply background worker, which will cause
> + * infinite waits.
> + */
> + if (!am_apply_bgworker() &&
> + (list_length(ApplyBgworkersFreeList) == list_length(ApplyBgworkersList)))
> + return;
>
> I struggled a bit to reconcile the comment with the condition. Is the
> !am_apply_bgworker() part of this even needed – isn't the
> list_length() check enough?

We need to check this for apply bgworker. (Both lists are "NIL" in apply
bgworker.)

> 5.
>
> + /* We are in error mode and should give user correct error. */
>
> I still [1, #3.4a] don't see the value in saying "should give correct
> error" (e.g. what's the alternative?).
>
> Maybe instead of that comment it can just say:
> rel->parallel_apply = PARALLEL_APPLY_UNSAFE;

I changed if-statement to report the error:
If 'parallel_apply' isn't 'PARALLEL_APPLY_SAFE', then report the error.

> 6. src/backend/replication/logical/proto.c - RelationGetUniqueKeyBitmap
>
> + /* Add referenced attributes to idindexattrs */
> + for (i = 0; i < indexRel->rd_index->indnatts; i++)
> + {
> + int attrnum = indexRel->rd_index->indkey.values[i];
> +
> + /*
> + * We don't include non-key columns into idindexattrs
> + * bitmaps. See RelationGetIndexAttrBitmap.
> + */
> + if (attrnum != 0)
> + {
> + if (i < indexRel->rd_index->indnkeyatts &&
> + !bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber, attunique))
> + attunique = bms_add_member(attunique,
> + attrnum - FirstLowInvalidHeapAttributeNumber);
> + }
> + }
>
> There are 2x comments in that code that are referring to
> 'idindexattrs' but I think it is a cut/paste problem because that
> variable name does not even exist in this copied function.

Fixed the comments.

> 7. src/backend/replication/logical/relation.c -
> logicalrep_rel_mark_parallel_apply
>
> + /* Initialize the flag. */
> + entry->parallel_apply = PARALLEL_APPLY_SAFE;
>
> I have unsuccessfully repeated the same review comment several times
> [1 #3.8] suggesting that this flag should not be initialized to SAFE.
> IMO the state should remain as UNKNOWN until you are either sure it is
> SAFE, or sure it is UNSAFE. Anyway, I'll give up on this point now;
> let's see what other people think.

Okay, I will follow the relevant comments later.

> 8. src/include/replication/logicalrelation.h
>
> +/*
> + * States to determine if changes on one relation can be applied using an
> + * apply background worker.
> + */
> +typedef enum ParallelApplySafety
> +{
> + PARALLEL_APPLY_UNKNOWN = 0, /* unknown */
> + PARALLEL_APPLY_SAFE, /* Can apply changes using an apply background
> + worker */
> + PARALLEL_APPLY_UNSAFE /* Can not apply changes using an apply
> + background worker */
> +} ParallelApplySafety;
> +
>
> I think the values are self-explanatory so the comments for every
> value add nothing here, particularly since the enum itself has a
> comment saying the same thing. I'm not sure if you accidentally missed
> my previous comment [1, #3.12b] about this, or just did not agree with
> it.

Changed.

> 9. .../subscription/t/015_stream.pl
>
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_subscriber->safe_psql(
> + 'postgres', qq{
> +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);
> +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel, binary = off);
> +});
>
> 9a.
> typo "defult"
>
> 9b.
> The problem with to_timestamp(1284352323) is that it looks like it
> must be some special value, but in fact AFAIK you don't care at all
> what value timestamp this is. I think it would be better here to just
> use to_timestamp(0) or to_timestamp(999) or similar so the number is
> obviously not something of importance.
>
> ======
>
> 10. .../subscription/t/016_stream.pl
>
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_subscriber->safe_psql(
> + 'postgres', qq{
> +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);
> +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel);
> +});
>
> 10a. Ditto 9a.
> 10b. Ditto 9b.
>
> ======
>
> 11. .../subscription/t/022_twophase_cascade.pl
>
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_B->safe_psql(
> + 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);");
> +$node_C->safe_psql(
> + 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);");
> +
>
> 11a. Ditto 9a.
> 11b. Ditto 9b.
>
> ======
>
> 12. .../subscription/t/023_twophase_stream.pl
>
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_subscriber->safe_psql(
> + 'postgres', qq{
> +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);
> +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel);
> +});
>
> 12a. Ditto 9a.
> 12b. Ditto 9b.

Improved.

> 13. .../subscription/t/032_streaming_apply.pl
>
> +# Drop default value on the subscriber, now it works.
> +$node_subscriber->safe_psql('postgres',
> + "ALTER TABLE test_tab1 ALTER COLUMN b DROP DEFAULT");
>
> Maybe for these tests like this it would be better to test if it works
> OK using an immutable DEFAULT function instead of just completely
> removing the bad function to make it work.
>
> I think maybe the same was done for TRIGGER tests. There was a test
> for a trigger with a bad function, and then the trigger was removed.
> What about including a test for the trigger with a good function?

Improved.

Attach the new patches.

Regards,
Wang wei

Attachment Content-Type Size
v22-0001-Perform-streaming-logical-transactions-by-backgr.patch application/octet-stream 114.3 KB
v22-0002-Test-streaming-parallel-option-in-tap-test.patch application/octet-stream 69.1 KB
v22-0003-Add-some-checks-before-using-apply-background-wo.patch application/octet-stream 54.6 KB
v22-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch application/octet-stream 30.6 KB
v22-0005-Add-a-main_worker_pid-to-pg_stat_subscription.patch application/octet-stream 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2022-08-16 07:37:00 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Julien Rouhaud 2022-08-16 06:58:43 Re: Expose Parallelism counters planned/execute in pg_stat_statements