Re: Perform streaming logical transactions by background workers and parallel apply

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-07-27 03:37:06
Message-ID: CAHut+Pv9cKurDQHtk-ygYp45-8LYdE=4sMZY-8UmbeDTGgECVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v19-0003:

======

3.1 doc/src/sgml/ref/create_subscription.sgml

@@ -240,6 +240,10 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
transaction is committed. Note that if an error happens when
applying changes in a background worker, the finish LSN of the
remote transaction might not be reported in the server log.
+ <literal>parallel</literal> mode has two requirements: 1) the unique
+ column in the relation on the subscriber-side should also be the
+ unique column on the publisher-side; 2) there cannot be any
+ non-immutable functions used by the subscriber-side replicated table.
</para>

3.1a.
It looked a bit strange starting the sentence with the enum
"<literal>parallel</literal> mode". Maybe reword it something like:

"This mode has two requirements: ..."
or
"There are two requirements for using <literal>parallel</literal> mode: ..."

3.1b.
Point 1) says "relation", but point 2) says "table". I think the
consistent term should be used.

======

3.2 <general>

For consistency, please search all this patch and replace every:

"... applied by an apply background worker" -> "... applied using an
apply background worker"

And also search/replace every:

"... in the apply background worker: -> "... using an apply background worker"

======

3.3 .../replication/logical/applybgworker.c

@@ -800,3 +800,47 @@ apply_bgworker_subxact_info_add(TransactionId current_xid)
MemoryContextSwitchTo(oldctx);
}
}
+
+/*
+ * Check if changes on this relation can be applied by an apply background
+ * worker.
+ *
+ * Although the commit order is maintained only allowing one process to commit
+ * at a time, the access order to the relation has changed. This could cause
+ * unexpected problems if the unique column on the replicated table is
+ * inconsistent with the publisher-side or contains non-immutable functions
+ * when applying transactions in the apply background worker.
+ */
+void
+apply_bgworker_relation_check(LogicalRepRelMapEntry *rel)

"only allowing" -> "by only allowing" (I think you mean this, right?)

~~~

3.4

+ /*
+ * Return if changes on this relation can be applied by an apply background
+ * worker.
+ */
+ if (rel->parallel_apply == PARALLEL_APPLY_SAFE)
+ return;
+
+ /* We are in error mode and should give user correct error. */
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot replicate target relation \"%s.%s\" using "
+ "subscription parameter streaming=parallel",
+ rel->remoterel.nspname, rel->remoterel.relname),
+ errdetail("The unique column on subscriber is not the unique "
+ "column on publisher or there is at least one "
+ "non-immutable function."),
+ errhint("Please change to use subscription parameter "
+ "streaming=on.")));

3.4a.
Of course, the code should give the user the "correct error" if there
is an error (!), but having a comment explicitly saying so does not
serve any purpose.

3.4b.
The logic might be simplified if it was written differently like:

+ if (rel->parallel_apply != PARALLEL_APPLY_SAFE)
+ ereport(ERROR, ...

======

3.5 src/backend/replication/logical/proto.c

@@ -40,6 +41,68 @@ static void logicalrep_read_tuple(StringInfo in,
LogicalRepTupleData *tuple);
static void logicalrep_write_namespace(StringInfo out, Oid nspid);
static const char *logicalrep_read_namespace(StringInfo in);

+static Bitmapset *RelationGetUniqueKeyBitmap(Relation rel);
+
+/*
+ * RelationGetUniqueKeyBitmap -- get a bitmap of unique attribute numbers
+ *
+ * This is similar to RelationGetIdentityKeyBitmap(), but returns a bitmap of
+ * index attribute numbers for all unique indexes.
+ */
+static Bitmapset *
+RelationGetUniqueKeyBitmap(Relation rel)

Why is the forward declaration needed when the static function
immediately follows it?

======

3.6 src/backend/replication/logical/relation.c -
logicalrep_relmap_reset_parallel_cb

@@ -91,6 +98,26 @@ logicalrep_relmap_invalidate_cb(Datum arg, Oid reloid)
}
}

+/*
+ * Relcache invalidation callback to reset parallel flag.
+ */
+static void
+logicalrep_relmap_reset_parallel_cb(Datum arg, int cacheid, uint32 hashvalue)

"reset parallel flag" -> "reset parallel_apply flag"

~~~

3.7 src/backend/replication/logical/relation.c -
logicalrep_rel_mark_parallel_apply

+ * There are two requirements for applying changes in an apply background
+ * worker: 1) The unique column in the relation on the subscriber-side should
+ * also be the unique column on the publisher-side; 2) There cannot be any
+ * non-immutable functions used by the subscriber-side.

This comment should exactly match the help text. See review comment #3.1

~~~

3.8

+ /* Initialize the flag. */
+ entry->parallel_apply = PARALLEL_APPLY_SAFE;

I previously suggested [1] (#3.6b) to move this. Consider, that you
cannot logically flag the entry as "safe" until you are certain that
it is safe. And you cannot be sure of that until you've passed all the
checks this function is doing. Therefore IMO the assignment to
PARALLEL_APPLY_SAFE should be the last line of the function.

~~~

3.9

+ /*
+ * Then, check if there is any non-immutable function used by the local
+ * table. Look for functions in the following places:
+ * a. trigger functions;
+ * b. Column default value expressions and domain constraints;
+ * c. Constraint expressions;
+ * d. Foreign keys.
+ */

"used by the local table" -> "used by the subscriber-side relation"
(reworded so that it is consistent with the First comment)

~~~

3.10

I previously suggested [1] (#3.7) to use goto in this function to
avoid the excessive number of returns. IMO there is nothing inherently
evil about gotos, so long as they are used with care - sometimes they
are the best option. Anyway, I attached some BEFORE/AFTER example code
to this post - others can judge which way is preferable.

======

3.11 src/backend/utils/cache/typcache.c - GetDomainConstraints

@@ -2540,6 +2540,23 @@ compare_values_of_enum(TypeCacheEntry *tcache,
Oid arg1, Oid arg2)
return 0;
}

+/*
+ * GetDomainConstraints --- get DomainConstraintState list of
specified domain type
+ */
+List *
+GetDomainConstraints(Oid type_id)
+{
+ TypeCacheEntry *typentry;
+ List *constraints = NIL;
+
+ typentry = lookup_type_cache(type_id, TYPECACHE_DOMAIN_CONSTR_INFO);
+
+ if(typentry->domainData != NULL)
+ constraints = typentry->domainData->constraints;
+
+ return constraints;
+}

This function can be simplified (if you want). e.g.

List *
GetDomainConstraints(Oid type_id)
{
TypeCacheEntry *typentry;

typentry = lookup_type_cache(type_id, TYPECACHE_DOMAIN_CONSTR_INFO);

return typentry->domainData ? typentry->domainData->constraints : NIL;
}

======

3.12 src/include/replication/logicalrelation.h

@@ -15,6 +15,19 @@
#include "access/attmap.h"
#include "replication/logicalproto.h"

+/*
+ * States to determine if changes on one relation can be applied using an
+ * apply background worker.
+ */
+typedef enum ParalleApplySafety
+{
+ PARALLEL_APPLY_UNKNOWN = 0, /* unknown */
+ PARALLEL_APPLY_SAFE, /* Can apply changes in an apply background
+ worker */
+ PARALLEL_APPLY_UNSAFE /* Can not apply changes in an apply background
+ worker */
+} ParalleApplySafety;
+

3.12a
Typo in enum and typedef names:
"ParalleApplySafety" -> "ParallelApplySafety"

3.12b
I think the values are quite self-explanatory now. Commenting on each
of them separately is not really adding anything useful.

3.12c.
New enum missing from typedefs.list?

======

3.13 typdefs.list

Should include the new typedef. See comment #3.12c.

------
[1] https://www.postgresql.org/message-id/OS3PR01MB62758A6AAED27B3A848CEB7A9E8F9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
logicalrep_rel_mark_parallel_apply-with-goto.txt text/plain 4.2 KB
logicalrep_rel_mark_parallel_apply-without-goto.txt text/plain 4.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-07-27 04:34:48 Re: pgsql: Remove the restriction that the relmap must be 512 bytes.
Previous Message David Rowley 2022-07-27 03:31:08 Re: [PoC] Reducing planning time when tables have many partitions