RE: row filtering for logical replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: row filtering for logical replication
Date: 2021-11-26 02:16:42
Message-ID: OS3PR01MB57184FB9EFA06AF190F2870F94639@OS3PR01MB5718.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, November 24, 2021 1:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> On Wed, Nov 24, 2021 at 6:51 AM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Tues, Nov 23, 2021 6:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > > On Tue, Nov 23, 2021 at 1:29 PM houzj(dot)fnst(at)fujitsu(dot)com
> > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > On Tues, Nov 23, 2021 2:27 PM vignesh C <vignesh21(at)gmail(dot)com>
> wrote:
> > > > > On Thu, Nov 18, 2021 at 7:04 AM Peter Smith
> > > > > <smithpb2250(at)gmail(dot)com>
> > > > > wrote:
> > > > > >
> > > > > > PSA new set of v40* patches.
> > > > >
> > > > > Few comments:
> > > > > 1) When a table is added to the publication, replica identity is
> > > > > checked. But while modifying the publish action to include
> > > > > delete/update, replica identity is not checked for the existing
> > > > > tables. I felt it should be checked for the existing tables too.
> > > >
> > > > In addition to this, I think we might also need some check to
> > > > prevent user from changing the REPLICA IDENTITY index which is used in
> > > > the filter expression.
> > > >
> > > > I was thinking is it possible do the check related to REPLICA
> > > > IDENTITY in function CheckCmdReplicaIdentity() or In
> > > > GetRelationPublicationActions(). If we move the REPLICA IDENTITY
> > > > check to this function, it would be consistent with the existing
> > > > behavior about the check related to REPLICA IDENTITY(see the
> > > > comments in CheckCmdReplicaIdentity) and seems can cover all the
> > > > cases mentioned above.
> > > >
> > >
> > > Yeah, adding the replica identity check in CheckCmdReplicaIdentity()
> > > would cover all the above cases but I think that would put a premium
> > > on each update/delete operation. I think traversing the expression
> > > tree (it could be multiple traversals if the relation is part of
> > > multiple publications) during each update/delete would be costly.
> > > Don't you think so?
> >
> > Yes, I agreed that traversing the expression every time would be costly.
> >
> > I thought maybe we can cache the columns used in row filter or cache
> > only the a
> > flag(can_update|delete) in the relcache. I think every operation that
> > affect the row-filter or replica-identity will invalidate the relcache
> > and the cost of check seems acceptable with the cache.
> >
>
> I think if we can cache this information especially as a bool flag then that should
> probably be better.

Based on this direction, I tried to write a top up POC patch(0005) which I'd like to share.

The top up patch mainly did the following things.

* Move the row filter columns invalidation to CheckCmdReplicaIdentity, so that
the invalidation is executed only when actual UPDATE or DELETE executed on the
published relation. It's consistent with the existing check about replica
identity.

* Cache the results of the validation for row filter columns in relcache to
reduce the cost of the validation. It's safe because every operation that
change the row filter and replica identity will invalidate the relcache.

Also attach the v42 patch set to keep cfbot happy.

Best regards,
Hou zj

Attachment Content-Type Size
v42-0002-PS-Row-filter-validation-walker.patch application/octet-stream 33.3 KB
v42-0003-Support-updates-based-on-old-and-new-tuple-in-ro.patch application/octet-stream 20.3 KB
v42-0004-Tab-auto-complete-and-pgdump-support-for-Row-Fil.patch application/octet-stream 5.4 KB
v42-0001-Row-filter-for-logical-replication.patch application/octet-stream 84.9 KB
v42-0005-Topup-cache-the-result-of-row-filter-column-validation.patch application/octet-stream 27.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message SATYANARAYANA NARLAPURAM 2021-11-26 02:19:03 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Previous Message tanghy.fnst@fujitsu.com 2021-11-26 02:15:06 RE: Skipping logical replication transactions on subscriber side