Re: Column Filtering in Logical Replication

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2021-08-08 19:59:50
Message-ID: CAH2L28stHzf6Yg3KxBEJ1JEr8C6xM4oSZfysdUObauZ00ArdWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> >
>> > Currently, this capability is not included in the patch. If the table on
>> > the subscriber
>> > server has lesser attributes than that on the publisher server, it
>> > throws an error at the
>> > time of CREATE SUBSCRIPTION.
>> >
>>
>> That's a bit surprising, to be honest. I do understand the patch simply
>> treats the filtered columns as "unchanged" because that's the simplest
>> way to filter the *data* of the columns. But if someone told me we can
>> "filter columns" I'd expect this to work without the columns on the
>> subscriber.
>>
>> OK, I will look into adding this.
>

This has been added in the attached patch. Now, instead of
treating the filtered columns as unchanged and sending a byte
with that information, unfiltered columns are not sent to the subscriber
server at all. This along with saving the network bandwidth, allows
the logical replication to even work between tables with different numbers
of
columns i.e with the table on subscriber server containing only the
filtered
columns. Currently, replica identity columns are replicated irrespective of
the presence of the column filters, hence the table on the subscriber side
must
contain the replica identity columns.

The patch adds a new parse node PublicationTable, but doesn't add
> copyfuncs.c, equalfuncs.c, readfuncs.c, outfuncs.c support for it.

Thanks, added this.

> Looking at OpenTableList(), I think you forgot to update the comment --
> it says "open relations specified by a RangeVar list",

Thank you for the review, Modified this.

To nitpick, I find "Bitmapset *att_list" a bit annoying, because it's
> not really a list ;-)

Changed this.

>
> It's not super clear to me that strlist_to_textarray() and related
> processing will behave sanely when the column names contain weird
> characters such as commas or quotes, or just when used with uppercase
> column names. Maybe it's worth having tests that try to break such
> cases.

Added a few test cases for this.

In AlterPublicationTables() I was confused by some code that seemed
> commented a bit too verbosely

Modified this as per the suggestion.

: REPLICA IDENTITY columns are always replicated
> : irrespective of column names specification.

... for which you don't have any tests

I have added these tests.

Having said that, I'm not sure I agree with this design decision; what I
> think this is doing is hiding from the user the fact that they are
> publishing columns that they don't want to publish. I think as a user I
> would rather get an error in that case:

ERROR: invalid column list in published set
> DETAIL: The set of published commands does not include all the replica
> identity columns.

or something like that. Avoid possible nasty surprises of security-
> leaking nature.

Ok, Thank you for your opinion. I agree that giving an explicit error in
this case will be safer.
I will include this, in case there are no counter views.

Thank you for your review comments. Please find attached the rebased and
updated patch.

Thank you,
Rahila Syed

Attachment Content-Type Size
v3-0001-Add-column-filtering-to-logical-replication.patch application/octet-stream 44.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-08-08 20:25:00 Re: Another regexp performance improvement: skip useless paren-captures
Previous Message Peter Geoghegan 2021-08-08 19:48:44 Re: ECPG bug fix: DECALRE STATEMENT and DEALLOCATE, DESCRIBE