Re: Column Filtering in Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-21 11:55:28
Message-ID: CAA4eK1KBHES_oY5AzGD7YDboKvK9qV_C9xm9y0fs=erO24sxAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 19, 2022 at 3:56 AM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 3/18/22 15:43, Tomas Vondra wrote:
> >
>
> As for the issue reported by Shi-San about replica identity full and
> column filters, presumably you're referring to this:
>
> create table tbl (a int, b int, c int);
> create publication pub for table tbl (a, b, c);
> alter table tbl replica identity full;
>
> postgres=# delete from tbl;
> ERROR: cannot delete from table "tbl"
> DETAIL: Column list used by the publication does not cover the
> replica identity.
>
> I believe not allowing column lists with REPLICA IDENTITY FULL is
> expected / correct behavior. I mean, for that to work the column list
> has to always include all columns anyway, so it's pretty pointless. Of
> course, we might check that the column list contains everything, but
> considering the list does always have to contain all columns, and it
> break as soon as you add any columns, it seems reasonable (cheaper) to
> just require no column lists.
>

Fair point. We can leave this as it is.

> I also went through the patch and made the naming more consistent. The
> comments used both "column filter" and "column list" randomly, and I
> think the agreement is to use "list" so I adopted that wording.
>
>
> However, while looking at how pgoutput, I realized one thing - for row
> filters we track them "per operation", depending on which operations are
> defined for a given publication. Shouldn't we do the same thing for
> column lists, really?
>
> I mean, if there are two publications with different column lists, one
> for inserts and the other one for updates, isn't it wrong to merge these
> two column lists?
>

The reason we can't combine row filters for inserts with
updates/deletes is that if inserts have some column that is not
present in RI then during update filtering (for old tuple) it will
give an error as the column won't be present in WAL log.

OTOH, the same problem won't be there for the column list/filter patch
because all the RI columns are there in the column list (for
update/delete) and we don't need to apply a column filter for old
tuples in either update or delete.

Basically, the filter rules are slightly different for row filters and
column lists, so we need them (combine of filters) for one but not for
the other. Now, for the sake of consistency with row filters, we can
do it but as such there won't be any problem or maybe we can just add
a comment for the same in code.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-03-21 12:11:31 Re: Fix unsigned output for signed values in SLRU error reporting
Previous Message Amit Kapila 2022-03-21 11:28:53 Re: Column Filtering in Logical Replication