Re: Column Filtering in Logical Replication

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2021-09-01 21:21:41
Message-ID: CAH2L28t7GP7bLnoFa_uS_k9wfX7R2mqPz1-CtZBfJOOG+CM8aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

>> Another point is what if someone drops the column used in one of the
>> publications? Do we want to drop the entire relation from publication
>> or just remove the column filter or something else?
>>
>
After thinking about this, I think it is best to remove the entire table
from publication,
if a column specified in the column filter is dropped from the table.
Because, if we drop the entire filter without dropping the table, it means
all the columns will be replicated,
and the downstream server table might not have those columns.
If we drop only the column from the filter we might have to recreate the
filter and check for replica identity.
That means if the replica identity column is dropped, you can't drop it
from the filter,
and might have to drop the entire publication-table mapping anyways.

Thus, I think it is cleanest to drop the entire relation from publication.

This has been implemented in the attached version.

> Do we want to consider that the columns specified in the filter must
>> not have NOT NULL constraint? Because, otherwise, the subscriber will
>> error out inserting such rows?
>>
>> I think you mean columns *not* specified in the filter must not have NOT
> NULL constraint
> on the subscriber, as this will break during insert, as it will try to
> insert NULL for columns
> not sent by the publisher.
> I will look into fixing this. Probably this won't be a problem in
> case the column is auto generated or contains a default value.
>
>
I am not sure if this needs to be handled. Ideally, we need to prevent the
subscriber tables from having a NOT NULL
constraint if the publisher uses column filters to publish the values of
the table. There is no way
to do this at the time of creating a table on subscriber.
As this involves querying the publisher for this information, it can be
done at the time of initial table synchronization.
i.e error out if any of the subscribed tables has NOT NULL constraint on
non-filter columns.
This will lead to the user dropping and recreating the subscription after
removing the
NOT NULL constraint from the table.
I think the same can be achieved by doing nothing and letting the
subscriber error out while inserting rows.

Minor comments:
>> ================
>> pq_sendbyte(out, flags);
>> -
>> /* attribute name */
>> pq_sendstring(out, NameStr(att->attname));
>>
>> @@ -953,6 +1000,7 @@ logicalrep_write_attrs(StringInfo out, Relation rel)
>>
>> /* attribute mode */
>> pq_sendint32(out, att->atttypmod);
>> +
>> }
>>
>> bms_free(idattrs);
>> diff --git a/src/backend/replication/logical/relation.c
>> b/src/backend/replication/logical/relation.c
>> index c37e2a7e29..d7a7b00841 100644
>> --- a/src/backend/replication/logical/relation.c
>> +++ b/src/backend/replication/logical/relation.c
>> @@ -354,7 +354,6 @@ logicalrep_rel_open(LogicalRepRelId remoteid,
>> LOCKMODE lockmode)
>>
>> attnum = logicalrep_rel_att_by_name(remoterel,
>> NameStr(attr->attname));
>> -
>> entry->attrmap->attnums[i] = attnum;
>>
>> There are quite a few places in the patch that contains spurious line
>> additions or removals.
>>
>>
> Fixed these in the attached patch.

Having said that, I'm not sure I agree with this design decision; what I
> think this is doing is hiding from the user the fact that they are
> publishing columns that they don't want to publish. I think as a user I
> would rather get an error in that case:

ERROR: invalid column list in published set
> DETAIL: The set of published commands does not include all the replica
> identity columns.

Added this.

Also added some more tests. Please find attached a rebased and updated
patch.

Thank you,
Rahila Syed

Attachment Content-Type Size
v4-0001-Add-column-filtering-to-logical-replication.patch application/octet-stream 48.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-09-01 21:25:57 Re: Clarify how triggers relate to transactions
Previous Message Zhihong Yu 2021-09-01 21:20:39 Re: [PATCH] Support pg_ident mapping for LDAP