Re: Added schema level support for publication.

From: japin <japinli(at)hotmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Added schema level support for publication.
Date: 2021-01-11 07:59:49
Message-ID: MEYP282MB166944617D4A6AFF8FEB8EF7B6AB0@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mon, 11 Jan 2021 at 14:15, Bharath Rupireddy wrote:
> On Sun, Jan 10, 2021 at 11:21 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>> On Sat, Jan 9, 2021 at 8:08 PM Bharath Rupireddy
>> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>> > I think this feature can be useful, in case a user has a lot of tables
>> > to publish inside a schema. Having said that, I wonder if this feature
>> > mandates users to create the same schema with same
>> > permissions/authorizations manually on the subscriber, because logical
>> > replication doesn't propagate any ddl's so are the schema or schema
>> > changes? Or is it that the list of tables from the publisher can go
>> > into a different schema on the subscriber?
>> >
>>
>> DDL's will not be propagated to the subscriber. Users have to create
>> the schema & tables in the subscriber. No change in
>> Permissions/authorizations handling, it will be the same as the
>> existing behavior for relations.
>
> Looks like the existing behaviour already requires users to create the
> schema on the subscriber when publishing the tables from that schema.
> Otherwise, an error is thrown on the subscriber [1].
>
> [1] on publisher:
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> INSERT INTO myschema.t1_myschema SELECT i, i+10 FROM generate_series(1,10) i;
> CREATE PUBLICATION testpub FOR TABLE myschema.t1;
>
> on subscriber:
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> ERROR: schema "myschema" does not exist
> CREATE SCHEMA myschema;
> CREATE TABLE myschema.t1(a1 int, b1 int);
> postgres=# CREATE SUBSCRIPTION testsub CONNECTION 'host=localhost
> dbname=postgres user=bharath port=5432' PUBLICATION testpub;
> NOTICE: created replication slot "testsub" on publisher
> CREATE SUBSCRIPTION
>
>> > Since the schema can have other objects such as data types, functions,
>> > operators, I'm sure with your feature, non-table objects will be
>> > skipped.
>> >
>>
>> Yes, only table data will be sent to subscribers, non-table objects
>> will be skipped.
>
> Looks like the existing CREATE PUBLICATION FOR ALL TABLES, which is
> for all the tables in the database, does this i.e. skips non-table
> objects and temporary tables, foreign tables and so on. So, your
> feature also can behave the same way, but within the scope of the
> given schema/s.
>
>> > As Amit pointed out earlier, the behaviour when schema dropped, I
>> > think we should also consider when schema is altered, say altered to a
>> > different name, maybe we should change that in the publication too.
>> >
>>
>> I agree that when schema is altered the renamed schema should be
>> reflected in the publication.
>
> I think, it's not only making sure that the publisher side has the new
> altered schema, but also the subscriber needs those alters. Having
> said that, since these alters come under DDL changes and in logical
> replication we don't publish the scheme changes to the subscriber, we
> may not need to anything extra for informing the schema alters to the
> subscriber from the publisher, the users might have to do the same
> schema alter on the subscriber and then a ALTER SUBSCRIPTION testsub
> REFRESH PUBLICATION; should work for them? If this understanding is
> correct, then we should document this.
>
>> > In general, what happens if we have some temporary tables or foreign
>> > tables inside the schema, will they be allowed to send the data to
>> > subscribers?
>> >
>>
>> Temporary tables & foreign tables will not be added to the publications.
>
> Yes the existing logical replication framework doesn't allow
> replication of temporary, unlogged, foreign tables and other non-table
> relations such as materialized views, indexes etc [1]. The CREATE
> PUBLICATION statement either fails in check_publication_add_relation
> or before that.
>
> CREATE PUBLICATION testpub FOR TABLE tab1, throwing the error if the
> single table tab1 is any of the above restricted tables, seems fine.
> But, if there's a list of tables with CREATE PUBLICATION testpub FOR
> TABLE normal_tab1, temp_tab2, normal_tab3, foreign_tab4,
> unlogged_tab5, normal_tab6, normal_tab7 ......; This query fails on
> first encounter of the restricted table, say at temp_tab2. Whereas,
> CREATE PUBLICATION testpub FOR ALL TABLES; would skip the restricted
> tables and continue to add the accepted tables into the publication
> within the database.
>
> IMHO, if there's a list of tables specified with FOR TABLE, then
> instead of throwing an error in case of any restricted table, we can
> issue a warning and continue with the addition of accepted tables into
> the publication. If done, this behaviour will be in sync with FOR ALL
> TABLES;
>
> Thoughts? If okay, I can work on a patch.
>
> Related to error messages: when foreign table is specified in CREATE
> PUBLICATION statement, then "ERROR: "f1" is not a table", is thrown
> [1], how about the error message "ERROR: foerign table "f1" cannot be
> replicated". In general, it would be good if we could have the error
> messages something like in [2] instead of the existing [1].
>
> Thoughts? If okay, I can work on a patch.
>
> [1]
> t1 is a temporary table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR: table "t1" cannot be replicated
> DETAIL: Temporary and unlogged relations cannot be replicated.
>
> t1 is an unlogged table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR: table "t1" cannot be replicated
> DETAIL: Temporary and unlogged relations cannot be replicated.
>
> f1 is a foreign table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
> ERROR: "f1" is not a table
> DETAIL: Only tables can be added to publications.
>
> mv1 is a mat view:
> postgres=# CREATE PUBLICATION testpub FOR TABLE mv1;
> ERROR: "mv1" is not a table
>
> idx1 is an index:
> postgres=# CREATE PUBLICATION testpub FOR TABLE idx1;
> ERROR: "idx1" is an index
>
> [2]
> t1 is a temporary table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR: temporary table "t1" cannot be replicated
> DETAIL: Temporary, unlogged and foreign relations cannot be replicated.
>
> t1 is an unlogged table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE t1;
> ERROR: unlogged table "t1" cannot be replicated
> DETAIL: Temporary, unlogged and foreign relations cannot be replicated.
>
> f1 is a foreign table:
> postgres=# CREATE PUBLICATION testpub FOR TABLE f1;
> ERROR: foreign table "f1" cannot be replicated
> DETAIL: Temporary, unlogged and foreign relations cannot be replicated.
>
>> > Say a user has created a publication for a schema with hundreds of
>> > tables in it, at some point later, can he stop replicating a single or
>> > some tables from that schema?
>> >
>>
>> There is no provision for this currently.
>
> The documentation [1] says, we can ALTER PUBLICATION testpub DROP
> TABLE t1; which removes the table from the list of published tables,
> but looks like it requires ALTER SUBSCRIPTION testsub REFRESH
> PUBLICATION; for the changes to become effective on the subscriber. I
> have done some testing for this case:
> 1) created publication for table t1, see \d+ t1, the associated
> publication is visible in the output
> 2) created subscription on the subscriber, initial available data from
> the publisher for table t1 is received
> 3) insert into table t1 on the publisher
> 4) inserted data in (3) is received in the subscriber table t1
> 5) alter publication to drop the table t1 on the publisher, see \d+
> t1, there will not be any associated publication in the output
> 6) execute alter subscription refresh publication on the subscriber,
> with the expectation that it should not receive the data from the
> publisher for the table t1 since it's dropped from the publication in
> (5)
> 7) insert into table t1 on the publisher
> 8) still the newly inserted data in (7) from the publisher, will be
> received into the table t1 in the subscriber
>
> IIUC, the behaviour of ALTER PUBLICATION DROP TABLE from the docs and
> the above use case, it looks like a bug to me. If I'm wrong, can
> someone correct me?
>

Yes, if we modify the publication, we should refresh the subscription on
each subscriber. It looks strange for me, especially for partitioned
tables [1].

> Thoughts?
>

Can we trace the different between publication and subscription, and
auto-refresh subscription on subscriber?

[1]
https://www.postgresql.org/message-id/flat/1D6DCFD2-0F44-4A18-BF67-17C2697B1631%40hotmail.com

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-01-11 08:09:10 Re: Improper use about DatumGetInt32
Previous Message Dmitry Shulga 2021-01-11 07:51:23 Re: Reduce the time required for a database recovery from archive.