Re: Allow logical replication to copy tables in binary format

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, Jelte Fennema <postgres(at)jeltef(dot)nl>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>
Subject: Re: Allow logical replication to copy tables in binary format
Date: 2023-02-28 10:59:22
Message-ID: CALDaNm3LVgvfS7wM0aP2G5ia-Vjjnf0zeJPaCGiWw9AZL_03hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Feb 2023 at 01:22, Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
>
> Hi,
>
> Thanks for all of your reviews!
>
> So, I made some changes in the v10 according to your comments.
>
> 1- copy_format is changed to a boolean parameter copy_binary.
> It sounds easier to use a boolean to enable/disable binary copy. Default value is false, so nothing changes in the current behaviour if copy_binary is not specified.
> It's still persisted into the catalog. This is needed since its value will be needed by tablesync workers later. And tablesync workers fetch subscription configurations from the catalog.
> In the copy_data case, it is not directly stored anywhere but it affects the state of the table which is stored in the catalog. So, I guess this is the convenient way if we decide to go with a new parameter.
>
> 2- copy_binary is not tied to another parameter
> The patch does not disallow any combination of copy_binary with binary and copy_data options. I tried to explain what binary copy can and cannot do in the docs. Rest is up to the user now.
>
> 3- Removed version check for copy_binary
> HEAD already does not have any check for binary option. Making binary copy work only if publisher and subscriber are the same major version can be too restricted.
>
> 4- Added separate test file
> Although I believe 002_types.pl and 014_binary.pl can be improved more for binary enabled subscription cases, this patch might not be the best place to make those changes.
> 032_binary_copy.pl now has the tests for binary copy option. There are also some regression tests in subscription.sql.
>
> Finally, some other small fixes are done according to the reviews.
>
> Also, thanks Bharath for performance testing [1]. It can be seen that the patch has some benefits.
>
> I appreciate further thoughts/reviews.

Thanks for the patch, Few comments:
1) Are primary key required for the tables, if not required we could
remove the primary key which will speed up the test by not creating
the indexes and inserting in the indexes. Even if required just create
for one of the tables:
+# Create tables on both sides of the replication
+my $ddl = qq(
+ CREATE TABLE public.test_numerical (
+ a INTEGER PRIMARY KEY,
+ b NUMERIC,
+ c FLOAT,
+ d BIGINT
+ );
+ CREATE TABLE public.test_arrays (
+ a INTEGER[] PRIMARY KEY,
+ b NUMERIC[],
+ c TEXT[]
+ );
+ CREATE TABLE public.test_range_array (
+ a INTEGER PRIMARY KEY,
+ b TSTZRANGE,
+ c int8range[]
+ );
+ CREATE TYPE public.test_comp_basic_t AS (a FLOAT, b TEXT, c INTEGER);
+ CREATE TABLE public.test_one_comp (
+ a INTEGER PRIMARY KEY,
+ b public.test_comp_basic_t
+ ););
+

2) We can have the Insert/Select of tables in the same order so that
it is easy to verify. test_range_array/test_one_comp
insertion/selection order was different.
+# Insert some content and before creating a subscription
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ INSERT INTO public.test_numerical (a, b, c, d) VALUES
+ (1, 1.2, 1.3, 10),
+ (2, 2.2, 2.3, 20);
+ INSERT INTO public.test_arrays (a, b, c) VALUES
+ ('{1,2,3}', '{1.1, 1.2, 1.3}', '{"one", "two", "three"}'),
+ ('{3,1,2}', '{1.3, 1.1, 1.2}', '{"three", "one", "two"}');
+ INSERT INTO test_range_array (a, b, c) VALUES
+ (1, tstzrange('Mon Aug 04 00:00:00 2014
CEST'::timestamptz, 'infinity'), '{"[1,2]", "[10,20]"}'),
+ (2, tstzrange('Sat Aug 02 00:00:00 2014
CEST'::timestamptz, 'Mon Aug 04 00:00:00 2014 CEST'::timestamptz),
'{"[2,3]", "[20,30]"}');
+ INSERT INTO test_one_comp (a, b) VALUES
+ (1, ROW(1.0, 'a', 1)),
+ (2, ROW(2.0, 'b', 2));
+ ));
+
+# Create the subscription with copy_binary = true
+my $publisher_connstring = $node_publisher->connstr . ' dbname=postgres';
+$node_subscriber->safe_psql('postgres',
+ "CREATE SUBSCRIPTION tsub CONNECTION '$publisher_connstring' "
+ . "PUBLICATION tpub WITH (slot_name = tpub_slot, copy_binary
= true)");
+
+# Ensure nodes are in sync with each other
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub');
+
+my $sync_check = qq(
+ SET timezone = '+2';
+ SELECT a, b, c, d FROM test_numerical ORDER BY a;
+ SELECT a, b, c FROM test_arrays ORDER BY a;
+ SELECT a, b FROM test_one_comp ORDER BY a;
+ SELECT a, b, c FROM test_range_array ORDER BY a;
+);

3) Should we check the results for test_myvarchar table only, since
there is no change in other tables, we need not check other tables
again:
+# Now tablesync should succeed
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub');
+
+$sync_check = qq(
+ SET timezone = '+2';
+ SELECT a, b, c, d FROM test_numerical ORDER BY a;
+ SELECT a, b, c FROM test_arrays ORDER BY a;
+ SELECT a, b FROM test_one_comp ORDER BY a;
+ SELECT a, b, c FROM test_range_array ORDER BY a;
+ SELECT a FROM test_myvarchar;
+);
+
+# Check the synced data on subscriber
+$result = $node_subscriber->safe_psql('postgres', $sync_check);

4) Similarly check only for test_mismatching_types in this case:
+# Cannot sync due to type mismatch
+$node_subscriber->wait_for_log(qr/ERROR: ( [A-Z0-9]+:)? incorrect
binary data format/);
+
+# Setting copy_binary to false should allow syncing
+$node_subscriber->safe_psql(
+ 'postgres', qq(
+ ALTER SUBSCRIPTION tsub SET (copy_binary = false);));
+
+$node_subscriber->wait_for_subscription_sync($node_publisher, 'tsub');
+
+$sync_check = qq(
+ SET timezone = '+2';
+ SELECT a, b, c, d FROM test_numerical ORDER BY a;
+ SELECT a, b, c FROM test_arrays ORDER BY a;
+ SELECT a, b FROM test_one_comp ORDER BY a;
+ SELECT a, b, c FROM test_range_array ORDER BY a;
+ SELECT a FROM test_myvarchar;
+ SELECT a FROM test_mismatching_types ORDER BY a;
+);
+
+# Check the synced data on subscribers
+$result = $node_subscriber->safe_psql('postgres', $sync_check);
+
+is( $result, '1|1.2|1.3|10
+2|2.2|2.3|20
+{1,2,3}|{1.1,1.2,1.3}|{one,two,three}
+{3,1,2}|{1.3,1.1,1.2}|{three,one,two}
+1|(1,a,1)
+2|(2,b,2)
+1|["2014-08-04 00:00:00+02",infinity)|{"[1,3)","[10,21)"}
+2|["2014-08-02 00:00:00+02","2014-08-04 00:00:00+02")|{"[2,4)","[20,31)"}
+a
+1
+2', 'check synced data on subscriber with copy_binary = false');

5) Should we change "Basic logical replication test" to "Test logical
replication of copy table in binary"
diff --git a/src/test/subscription/t/032_binary_copy.pl
b/src/test/subscription/t/032_binary_copy.pl
new file mode 100644
index 0000000000..bcad66e5ea
--- /dev/null
+++ b/src/test/subscription/t/032_binary_copy.pl
@@ -0,0 +1,223 @@
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Basic logical replication test
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+

6) We can change "Initialize publisher node" to "Create publisher
node" to maintain consistency.
+# Initialize publisher node
+my $node_publisher = PostgreSQL::Test::Cluster->new('publisher');
+$node_publisher->init(allows_streaming => 'logical');
+$node_publisher->start;
+
+# Create subscriber node
+my $node_subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$node_subscriber->init(allows_streaming => 'logical');
+$node_subscriber->start;

7) Should "Insert some content and before creating a subscription." be
changed to:
"Insert some content before creating a subscription."

+# Publish all tables
+$node_publisher->safe_psql('postgres',
+ "CREATE PUBLICATION tpub FOR ALL TABLES");
+
+# Insert some content and before creating a subscription
+$node_publisher->safe_psql(
+ 'postgres', qq(
+ INSERT INTO public.test_numerical (a, b, c, d) VALUES
+ (1, 1.2, 1.3, 10),

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2023-02-28 11:36:33 Re: SQL/JSON revisited
Previous Message Jeevan Ladhe 2023-02-28 10:19:18 Re: Make some xlogreader messages more accurate