Re: Allow logical replication to copy tables in binary format

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Allow logical replication to copy tables in binary format
Date: 2023-03-14 00:06:46
Message-ID: CAHut+PsAS8HpjdbDv+RM-YUJaLO0UC3f5be+qN296+GrewsGXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v12-0001

======
General

1.
There is no new test code. Are we sure that there are already
sufficient TAP tests doing binary testing with/without copy_data and
covering all the necessary combinations?

======
Commit Message

2.
Without this patch, table are copied in text format even if the
subscription is created with binary option enabled. This patch allows
logical replication
to perform in binary format starting from initial sync. When binary
format is beneficial
to use, allowing the subscription to copy tables in binary in table
sync phase may
reduce the time spent on copy depending on column types.

~

a. "table" -> "tables"

b. I don't think need to keep referring to the initial table
synchronization many times.

SUGGESTION
Without this patch, table synchronization COPY uses text format even
if the subscription is created with the binary option enabled. Copying
tables in binary format may reduce the time spent depending on column
types.

======
doc/src/sgml/logical-replication.sgml

3.
@@ -241,10 +241,11 @@
types of the columns do not need to match, as long as the text
representation of the data can be converted to the target type. For
example, you can replicate from a column of type <type>integer</type> to a
- column of type <type>bigint</type>. The target table can also have
- additional columns not provided by the published table. Any such columns
- will be filled with the default value as specified in the definition of the
- target table.
+ column of type <type>bigint</type>. However, replication in
binary format is
+ type specific and does not allow to replicate data between different types
+ according to its restrictions. The target table can also have additional
+ columns not provided by the published table. Any such columns
will be filled
+ with the default value as specified in the definition of the target table.
</para>

I am not sure if there is enough information here about the binary restrictions.
- e.g. does the column order matter for tablesync COPY binary?
- e.g. no mention of the send/receive function requirements of tablesync COPY.

But maybe here is not the place to write all such details anyway;
Instead of duplicating information IMO here should give a link to the
CREATE SUBSCRIPTION notes -- something like:

SUGGESTION
Note that replication in binary format is more restrictive. See CREATE
SUBSCRIPTION binary subscription parameter for details.

======
doc/src/sgml/ref/create_subscription.sgml

4.
@@ -189,11 +189,17 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
<term><literal>binary</literal> (<type>boolean</type>)</term>
<listitem>
<para>
- Specifies whether the subscription will request the publisher to
- send the data in binary format (as opposed to text).
- The default is <literal>false</literal>.
- Even when this option is enabled, only data types having
- binary send and receive functions will be transferred in binary.
+ Specifies whether the subscription will both copy the initial data to
+ synchronize relations and request the publisher to send the data in
+ binary format (as opposed to text). The default is
<literal>false</literal>.
+ Binary format can be faster than the text format, but it is
less portable
+ across machine architectures and PostgreSQL versions.
Binary format is
+ also very data type specific, it will not allow copying
between different
+ column types as opposed to text format. Even when this
option is enabled,
+ only data types having binary send and receive functions will be
+ transferred in binary. Note that the initial synchronization requires
+ all data types to have binary send and receive functions, otherwise
+ the synchronization will fail.
</para>

There seems to be a small ambiguity because this wording comes more
from our code-level understanding, rather than what the user sees.
e.g. I think "will be transferred" could mean also the COPY phase as
far as the user is concerned. Maybe some slight rewording can help.

There is also some use of "copy" (e.g. "will not allow copying") which
can be confused with the initial tablesync phase which is not what was
intended.

SUGGESTION
Specifies whether the subscription will request the publisher to send
the data in binary format (as opposed to text). The default is
<literal>false</literal>. Any initial table synchronization copy [link
to copy_data] also uses the same format. Using binary format can be
faster than the text format, but it is less portable across machine
architectures and PostgreSQL versions. Binary format is also data type
specific, it will not allow transfer between different column types as
opposed to text format. Even when the binary option is enabled, only
data types having binary send/receive functions can be transferred in
binary format. If these functions don't exist then the publisher send
will revert to sending text format. Note that the binary initial table
synchronization copy requires all data types to have binary
send/receive functions, otherwise it will fail.

======
src/backend/replication/logical/tablesync.c

5.
+
+ /*
+ * If the publisher is v14 or later, copy data in the required data format.
+ * If the publisher version is earlier, it doesn't support COPY with binary
+ * option.
+ */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 140000 &&
+ MySubscription->binary)
+ {
+ appendStringInfo(&cmd, " WITH (FORMAT binary)");
+ options = lappend(options, makeDefElem("format", (Node *)
makeString("binary"), -1));
+ }
+

5a.
I didn't think you need to say "copy data in the required data format".

Can’t you just say like:

SUGGESTION
If the publisher version is earlier than v14, it COPY command doesn't
support the binary option.

~

5b.
Does this also need to be mentioned as a note on the CREATE
SUBSCRIPTION docs page? e.g. COPY binary from server versions < v14
will work because it will just be skipping anyway and use text.

======
doc/src/sgml/ref/alter_subscription.sgml

6.
The v12 patch does not update the ALTER SUBSCRIPTION DOCS, but I
thought perhaps there should be some reference from the ALTER
copy_data back to the main CREATE SUBSCRIPTION page because if the
user leaves the default (copy_data=true) then the ALTER might cause
some unexpected errors is the subscription was already using binary
format.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-03-14 00:26:03 Re: psql \watch 2nd argument: iteration count
Previous Message Thomas Munro 2023-03-14 00:01:28 Re: windows CI failing PMSignalState->PMChildFlags[slot] == PM_CHILD_ASSIGNED