Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-14 05:20:14
Message-ID: CAA4eK1JdLzJEmxxzEEYAOg41Om3Y88uL+7CgXdvnAaj7hkw8BQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 14, 2021 at 4:44 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Tue, Dec 7, 2021 at 5:48 PM tanghy(dot)fnst(at)fujitsu(dot)com
> <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > I think for "FOR ALL TABLE" publication(p2 in my case), table tbl should be
> > treated as no filter, and table tbl should have no filter in subscription sub. Thoughts?
> >
> > But for now, the filter(a > 10) works both when copying initial data and later changes.
> >
> > To fix it, I think we can check if the table is published in a 'FOR ALL TABLES'
> > publication or published as part of schema in function pgoutput_row_filter_init
> > (which was introduced in v44-0003 patch), also we need to make some changes in
> > tablesync.c.
> >
>
> Partly fixed in v46-0005 [1]
>
> NOTE
> - The initial COPY part of the tablesync does not take the publish
> operation into account so it means that if any of the subscribed
> publications have "puballtables" flag then all data will be copied
> sans filters.
>

I think this should be okay but the way you have implemented it in the
patch doesn't appear to be the optimal way. Can't we fetch
allpubtables info and qual info as part of one query instead of using
separate queries?

> I guess this is consistent with the other decision to
> ignore publication operations [2].
>
> TODO
> - Documentation
> - IIUC there is a similar case yet to be addressed - FOR ALL TABLES IN SCHEMA
>

Yeah, "FOR ALL TABLES IN SCHEMA" should also be addressed. In this
case, the difference would be that we need to check the presence of
schema corresponding to the table (for which we are fetching
row_filter information) is there in pg_publication_namespace. If it
exists then we don't need to apply row_filter for the table. I feel it
is better to fetch all this information as part of the query which you
are using to fetch row_filter info. The idea is to avoid the extra
round-trip between subscriber and publisher.

Few other comments:
===================
1.
@@ -926,6 +928,22 @@ pgoutput_row_filter_init(PGOutputData *data,
Relation relation, RelationSyncEntr
bool rfisnull;

/*
+ * If the publication is FOR ALL TABLES then it is treated same as if this
+ * table has no filters (even if for some other publication it does).
+ */
+ if (pub->alltables)
+ {
+ if (pub->pubactions.pubinsert)
+ no_filter[idx_ins] = true;
+ if (pub->pubactions.pubupdate)
+ no_filter[idx_upd] = true;
+ if (pub->pubactions.pubdelete)
+ no_filter[idx_del] = true;
+
+ continue;
+ }

Is there a reason to continue checking the other publications if
no_filter is true for all kind of pubactions?

2.
+ * All row filter expressions will be discarded if there is one
+ * publication-relation entry without a row filter. That's because
+ * all expressions are aggregated by the OR operator. The row
+ * filter absence means replicate all rows so a single valid
+ * expression means publish this row.

This same comment is at two places, remove from one of the places. I
think keeping it atop for loop is better.

3.
+ {
+ int idx;
+ bool found_filters = false;

I am not sure if starting such ad-hoc braces in the code to localize
the scope of variables is a regular practice. Can we please remove
this?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-12-14 05:31:24 Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
Previous Message vignesh C 2021-12-14 04:23:06 Re: Skipping logical replication transactions on subscriber side