Re: Column Filtering in Logical Replication

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Hou, Zhijie/侯 志杰 <houzj(dot)fnst(at)fujitsu(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-01-14 13:38:00
Message-ID: 202201141338.je5bniwn7oay@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2022-Jan-14, Amit Kapila wrote:

> 1. Replica Identity handling: Currently the column filter patch gives
> an error during create/alter subscription if the specified column list
> is invalid (Replica Identity columns are missing). It also gives an
> error if the user tries to change the replica identity. However, it
> doesn't deal with cases where the user drops and adds a different
> primary key that has a different set of columns which can lead to
> failure during apply on the subscriber.

Hmm, yeah, I suppose we should check that the primary key is compatible
with the column list in all publications. (I wonder what happens in the
interim, that is, what happens to tuples modified after the initial PK
is dropped and before the new PK is installed. Are these considered to
have "replica identiy nothing"?)

> I think another issue w.r.t column filter patch is that even while
> creating publication (even for 'insert' publications) it should check
> that all primary key columns must be part of published columns,
> otherwise, it can fail while applying on subscriber as it will try to
> insert NULL for the primary key column.

I'm not so sure about the primary key aspects, actually; keep in mind
that the replica can have a different table definition, and it might
have even a completely different primary key. I think this part is up
to the user to set up correctly; we have enough with just trying to make
the replica identity correct.

> 2. Handling of partitioned tables vs. Replica Identity (RI): When
> adding a partitioned table with a column list to the publication (with
> publish_via_partition_root = false), we should check the Replica
> Identity of all its leaf partition as the RI on the partition is the
> one actually takes effect when publishing DML changes. We need to
> check RI while attaching the partition as well, as the newly added
> partitions will automatically become part of publication if the
> partitioned table is part of the publication. If we don't do this the
> later deletes/updates can fail.

Hmm, yeah.

> 3. Tablesync.c handling: Ideally, it would be good if we have a single
> query to fetch both row filters and column filters but even if that is
> not possible in the first version, the behavior should be same for
> both queries w.r.t partitioned tables, For ALL Tables and For All
> Tables In Schema cases.
>
> Currently, the column filter patch doesn't seem to respect For ALL
> Tables and For All Tables In Schema cases, basically, it just copies
> the columns it finds through some of the publications even if one of
> the publications is defined as For All Tables. The row filter patch
> ignores the row filters if one of the publications is defined as For
> ALL Tables and For All Tables In Schema.

Oh, yeah, if a table appears in two publications and one of them is ALL
TABLES [IN SCHEMA], then we don't consider it as an all-columns
publication. You're right, that should be corrected.

> We have done some testing w.r.t above cases with both patches and my
> colleague will share the results.

Great, thanks.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2022-01-14 14:10:43 Re: generic plans and "initial" pruning
Previous Message Thomas Munro 2022-01-14 13:32:35 Re: A test for replay of regression tests