Re: Added schema level support for publication.

From: Li Japin <japinli(at)hotmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Added schema level support for publication.
Date: 2021-01-11 10:59:43
Message-ID: A7A618FB-F87C-439C-90A3-93CF9E7341FF@hotmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Jan 11, 2021, at 5:06 PM, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com<mailto:bharath(dot)rupireddyforpostgres(at)gmail(dot)com>> wrote:

On Mon, Jan 11, 2021 at 1:29 PM japin <japinli(at)hotmail(dot)com<mailto:japinli(at)hotmail(dot)com>> wrote:
Say a user has created a publication for a schema with hundreds of
tables in it, at some point later, can he stop replicating a single or
some tables from that schema?

There is no provision for this currently.

The documentation [1] says, we can ALTER PUBLICATION testpub DROP
TABLE t1; which removes the table from the list of published tables,
but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
PUBLICATION; for the changes to become effective on the subscriber. I
have done some testing for this case:
1) created publication for table t1, see \d+ t1, the associated
publication is visible in the output
2) created subscription on the subscriber, initial available data from
the publisher for table t1 is received
3) insert into table t1 on the publisher
4) inserted data in (3) is received in the subscriber table t1
5) alter publication to drop the table t1 on the publisher, see \d+
t1, there will not be any associated publication in the output
6) execute alter subscription refresh publication on the subscriber,
with the expectation that it should not receive the data from the
publisher for the table t1 since it's dropped from the publication in
(5)
7) insert into table t1 on the publisher
8) still the newly inserted data in (7) from the publisher, will be
received into the table t1 in the subscriber

IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
the above use case, it looks like a bug to me. If I'm wrong, can
someone correct me?

Yes, if we modify the publication, we should refresh the subscription on
each subscriber. It looks strange for me, especially for partitioned
tables [1].

Thoughts?

Can we trace the different between publication and subscription, and
auto-refresh subscription on subscriber?

[1]
https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com

As Amit stated in your thread [1], DDLs like creation of the new
tables or partitions, schema changes etc. on the publisher can not be
replicated automatically by the logical replication framework to the
subscriber. Users have to perform those DDLs on the subscribers by
themselves.

Yeah, DDLs is not supported now. On publisher, the partitions are added to the
publication automatically. However, even if we created the partitions on subscriber,
it will not sync the new partitions, because it likes normal table, we must execute
ALTER SUBSCRIPTION my_test REFRESH PUBLICATION;
I preferred it will automatically add to subscription when we create the new partitions
if the partitions is already in publication.

If your point is to at least issue the ALTER SUBSCRIPTION testsub
REFRESH PUBLICATION; from the publication whenever the publication is
altered i.e. added or dropped tables, IMO, we cannot do this, because
running this command on the subscriber only makes sense, after user
runs the same DDLs (which were run on the publisher) also on the
subscriber. To illustrate this:
1) create a new table or partition on the publisher and add it to
publisher, note that the same table has not yet been created on the
subscriber
2) imagine the publisher issuing an auto refresh command to all the
subscribers, then, no point in that right, because the new table or
the partition is not yet created on all the subscribers.

So, IMO, we can not have an auto refresh mechanism, until we have the
feature to replicate the DDL changes to all the subscribers.

Thanks for clarification.

What I stated in my earlier mail [1] is that even though we drop the
table from the publication in the publisher and run a refresh
publication on the subscriber, still the data is being replicated from
the publisher to the subscriber table. I just wanted to know whether
this is the expected behaviour or what exactly means. a user running
ALTER PUBLICATION mypub DROP TABLE mytable;

[1] - https://www.postgresql.org/message-id/CALj2ACWAxO3vSToT0o5nXL%3Drz5cNx90zaV-at%3DcvM14Tag4%3DcQ%40mail.gmail.com

Sorry, I misunderstood. After the test (ce6a71fa530). I found that if we do not insert data
between step (5) and (6), it will not ship the new records, however, if we insert
data between step (5) and (6), it will ship the new records.

(1) created publication for table t1, t2
postgres[8765]=# CREATE TABLE t1 (a int);
CREATE TABLE
postgres[8765]=# CREATE TABLE t2 (a int);
CREATE TABLE
postgres[8765]=# INSERT INTO t1 VALUES (1);
INSERT 0 1
postgres[8765]=# INSERT INTO t2 VALUES (1);
INSERT 0 1
postgres[8765]=# CREATE PUBLICATION mypub1 FOR TABLE t1;
CREATE PUBLICATION
postgres[8765]=# CREATE PUBLICATION mypub2 FOR TABLE t2;
CREATE PUBLICATION

(2) created subscription on the subscriber
postgres[9812]=# CREATE TABLE t1 (a int);
CREATE TABLE
postgres[9812]=# CREATE TABLE t2 (a int);
CREATE TABLE
postgres[9812]=# CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub1;
NOTICE: created replication slot "mysub1" on publisher
CREATE SUBSCRIPTION
postgres[9812]=# CREATE SUBSCRIPTION mysub2 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub2;
NOTICE: created replication slot "mysub2" on publisher
CREATE SUBSCRIPTION
postgres[9812]=# TABLE t1;
a
---
1
(1 row)

postgres[9812]=# TABLE t2;
a
---
1
(1 row)

(3) insert into table t1, t2 on the publisher
postgres[8765]=# INSERT INTO t1 VALUES (2);
INSERT 0 1
postgres[8765]=# INSERT INTO t2 VALUES (2);
INSERT 0 1

(4) inserted data in (3) is received in the subscriber table t1, t2
postgres[9812]=# TABLE t1;
a
---
1
2
(2 rows)

postgres[9812]=# TABLE t2;
a
---
1
2
(2 rows)

(5) alter publication to drop table, we insert a record into t1 on publisher
postgres[8765]=# ALTER PUBLICATION mypub1 DROP TABLE t1;
ALTER PUBLICATION
postgres[8765]=# ALTER PUBLICATION mypub2 DROP TABLE t2;
ALTER PUBLICATION
postgres[8765]=# INSERT INTO t1 VALUES (3);
INSERT 0 1

(6) check the data on subscriber
postgres[9812]=# TABLE t1;
a
---
1
2
3
(3 rows)

postgres[9812]=# TABLE t2;
a
---
1
2
(2 rows)

(7) refresh subscription on the subscriber
postgres[9812]=# ALTER SUBSCRIPTION mysub1 REFRESH PUBLICATION;
ALTER SUBSCRIPTION
postgres[9812]=# ALTER SUBSCRIPTION mysub2 REFRESH PUBLICATION;
ALTER SUBSCRIPTION

(8) insert into table t1, t2 on the publisher
postgres[8765]=# INSERT INTO t1 VALUES (4);
INSERT 0 1
postgres[8765]=# INSERT INTO t2 VALUES (4);
INSERT 0 1

(9) the newly inserted data in (5), (7) for table t1 shipped to subscriber, however
t2 doesn’t
postgres[9812]=# TABLE t1;
a
---
1
2
3
4
(4 rows)

postgres[9812]=# TABLE t2;
a
---
1
2
(2 rows)

It might be a bug.

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-01-11 11:00:33 Re: Single transaction in the tablesync worker?
Previous Message Ajin Cherian 2021-01-11 10:23:36 Re: Single transaction in the tablesync worker?