Re: Perform streaming logical transactions by background workers and parallel apply

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-11-18 07:03:20
Message-ID: CAHut+Ps7TzqqDnuH8r_ct1W_zSBCnuo3wodMt4Y8_Gw7rSRAaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v47-0001

(This review is a WIP - I will post more comments for this patch next week)

======

.../replication/logical/applyparallelworker.c

1.

+ * Copyright (c) 2022, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION src/backend/replication/logical/applyparallelworker.c
+ *

This IDENTIFICATION should be on 2 lines like it previously was
instead of wrapped into one line. For consistency with all other file
headers.

~~~

2. File header comment

+ * Since the database structure (schema of subscription tables, etc.) of
+ * publisher and subscriber may be different.

Incomplete sentence?

~~~

3.

+ * When the following two scenarios occur, a deadlock occurs.

Actually, you described three scenarios in this comment. Not two.

SUGGESTION
The following scenarios can cause a deadlock.

~~~

4.

+ * LA (waiting to acquire the local transaction lock) -> PA1 (waiting to
+ * acquire the lock on the unique index) -> PA2 (waiting to acquire the lock on
+ * the remote transaction) -> LA

"PA1" -> "PA-1"
"PA2" -> "PA-2"

~~~

5.

+ * To resolve this issue, we use non-blocking write and wait with a timeout. If
+ * timeout is exceeded, the LA report an error and restart logical replication.

"report" --> "reports"
"restart" -> "restarts"

OR

"LA report" -> "LA will report"

~~~

6. pa_wait_for_xact_state

+/*
+ * Wait until the parallel apply worker's transaction state reach or exceed the
+ * given xact_state.
+ */
+static void
+pa_wait_for_xact_state(ParallelApplyWorkerShared *wshared,
+ ParallelTransState xact_state)

"reach or exceed" -> "reaches or exceeds"

~~~

7. pa_stream_abort

+ /*
+ * Although the lock can be automatically released during transaction
+ * rollback, but we still release the lock here as we may not in a
+ * transaction.
+ */
+ pa_unlock_transaction(xid, AccessShareLock);

"but we still" -> "we still"
"we may not in a" -> "we may not be in a"

~~~

8.

+ pa_savepoint_name(MySubscription->oid, subxid, spname,
+ sizeof(spname));
+

Unnecessary wrapping

~~~

9.

+ for (i = list_length(subxactlist) - 1; i >= 0; i--)
+ {
+ TransactionId xid_tmp = lfirst_xid(list_nth_cell(subxactlist, i));
+
+ if (xid_tmp == subxid)
+ {
+ found = true;
+ break;
+ }
+ }
+
+ if (found)
+ {
+ RollbackToSavepoint(spname);
+ CommitTransactionCommand();
+ subxactlist = list_truncate(subxactlist, i + 1);
+ }

This code logic does not seem to require the 'found' flag. You can do
the RollbackToSavepoint/CommitTransactionCommand/list_truncate before
the break.

~~~

10. pa_lock/unlock _stream/_transaction

+/*
+ * Helper functions to acquire and release a lock for each stream block.
+ *
+ * Set locktag_field4 to 0 to indicate that it's a stream lock.
+ */

+/*
+ * Helper functions to acquire and release a lock for each local transaction.
+ *
+ * Set locktag_field4 to 1 to indicate that it's a transaction lock.

Should constants/defines/enums replace those magic numbers 0 and 1?

~~~

11. pa_lock_transaction

+ * Note that all the callers are passing remote transaction ID instead of local
+ * transaction ID as xid. This is because the local transaction ID will only be
+ * assigned while applying the first change in the parallel apply, but it's
+ * possible that the first change in parallel apply worker is blocked by a
+ * concurrently executing transaction in another parallel apply worker causing
+ * the leader cannot get local transaction ID.

"causing the leader cannot" -> "which means the leader cannot" (??)

======

src/backend/replication/logical/worker.c

12. TransApplyAction

+/*
+ * What action to take for the transaction.
+ *
+ * TRANS_LEADER_APPLY:
+ * The action means that we are in the leader apply worker and changes of the
+ * transaction are applied directly in the worker.
+ *
+ * TRANS_LEADER_SERIALIZE:
+ * It means that we are in the leader apply worker or table sync worker.
+ * Changes are written to temporary files and then applied when the final
+ * commit arrives.
+ *
+ * TRANS_LEADER_SEND_TO_PARALLEL:
+ * The action means that we are in the leader apply worker and need to send the
+ * changes to the parallel apply worker.
+ *
+ * TRANS_PARALLEL_APPLY:
+ * The action that we are in the parallel apply worker and changes of the
+ * transaction are applied directly in the worker.
+ */
+typedef enum

12a
Too many various ways of saying the same thing:

"The action means that we..."
"It means that we..."
"The action that we..." (typo?)

Please word all these comments consistently

~

12b.
"directly in the worker" -> "directly by the worker" (??) 2x

~~~

13. get_worker_name

+/*
+ * Return the name of the logical replication worker.
+ */
+static const char *
+get_worker_name(void)
+{
+ if (am_tablesync_worker())
+ return _("logical replication table synchronization worker");
+ else if (am_parallel_apply_worker())
+ return _("logical replication parallel apply worker");
+ else
+ return _("logical replication apply worker");
+}

This function belongs nearer the top of the module (above all the
error messages that are using it).

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-11-18 07:13:01 Re: contrib: auth_delay module
Previous Message Bharath Rupireddy 2022-11-18 06:06:34 Re: Avoid double lookup in pgstat_fetch_stat_tabentry()