RE: row filtering for logical replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: row filtering for logical replication
Date: 2022-01-28 03:26:34
Message-ID: OS0PR01MB57169DC3E8D7B7E588CD64DC94229@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, January 26, 2022 6:57 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Jan 26, 2022 at 8:37 AM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Monday, January 24, 2022 4:38 PM Peter Smith
> <smithpb2250(at)gmail(dot)com> wrote:
> > >
> > >
> > > 3. src/backend/utils/cache/relcache.c - RelationBuildPublicationDesc
> > >
> > > +RelationBuildPublicationDesc(Relation relation)
> > > {
> > > List *puboids;
> > > ListCell *lc;
> > > MemoryContext oldcxt;
> > > Oid schemaid;
> > > - PublicationActions *pubactions = palloc0(sizeof(PublicationActions));
> > > + List *ancestors = NIL;
> > > + Oid relid = RelationGetRelid(relation); AttrNumber invalid_rfcolnum =
> > > + InvalidAttrNumber; PublicationDesc *pubdesc =
> > > + palloc0(sizeof(PublicationDesc)); PublicationActions *pubactions =
> > > + &pubdesc->pubactions;
> > > +
> > > + pubdesc->rf_valid_for_update = true;
> > > + pubdesc->rf_valid_for_delete = true;
> > >
> > > IMO it wold be better to change the "sense" of those variables.
> > > e.g.
> > >
> > > "rf_valid_for_update" --> "rf_invalid_for_update"
> > > "rf_valid_for_delete" --> "rf_invalid_for_delete"
> > >
> > > That way they have the same 'sense' as the AttrNumbers so it all reads better
> to
> > > me.
> > >
> > > Also, it means no special assignment is needed because the palloc0 will set
> > > them correctly
> >
> > Think again, I am not sure it's better to have an invalid_... flag.
> > It seems more natural to have a valid_... flag.
> >

Thanks for the comments !

> Can't we do without these valid_ flags? AFAICS, if we check for
> "invalid_" attributes, it should serve our purpose because those can
> have some attribute number only when the row filter contains some
> column that is not part of RI. A few possible optimizations in
> RelationBuildPublicationDesc:

I slightly refactored the logic here.

> a. It calls contain_invalid_rfcolumn with pubid and then does cache
> lookup to again find a publication which its only caller has access
> to, so can't we pass the same?

Adjusted the code here.

> b. In RelationBuildPublicationDesc(), we call
> GetRelationPublications() to get the list of publications and then
> process those publications. I think if none of the publications has
> row filter and the relation has replica identity then we don't need to
> build the descriptor at all. If we do this optimization inside
> RelationBuildPublicationDesc, we may want to rename function as
> CheckAndBuildRelationPublicationDesc or something like that?

After thinking more on this and considering Alvaro's comments. I did some
changes for the RelationBuildPublicationDesc function to try to
make it more natural.

- Make the function always collect the complete information instead of
returning immediately when find invalid rowfilter.

The reason for this change is: some extensions(3rd-part) might only care
about the cached publication actions, this approach can make sure they can
still get complete pulication actions as usual. Besides, this is also
consistent with the other existing cache management functions(like
RelationGetIndexAttrBitmap ...) which will always build complete information
even if user only want part of it.

- Only cache the flag rf_valid_for_[update|delete] flag in PublicationDesc
instead of the invalid rowfilter column.

Because it's a bit unnatural to me to store an invalid thing in relcache. Note
that now the patch doesn't report the column number in the error message. If we
later decide that the accurate column number or publication is useful, I
think it might be better to add a separate simple function(get_invalid_...)
to report the accurate column or publication instead of reusing the cache
management function.

Also address Peter's comments[1] and Greg's comments[2] [3]

[1] https://www.postgresql.org/message-id/CAHut%2BPsG1G80AoSYka7m1x05vHjKZAzKeVyK4b6CAm2-sTkadg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAJcOf-c7XrtsWSGppb96-eQxPbtg%2BAfssAtTXNYbT8QuhdyOYA%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAJcOf-f0kc%2B4xGEgkvqNLkbJxMf8Ff0E9gTO2biHDoSJnxyziA%40mail.gmail.com

Attach the V72 patch set which did the above changes.

Best regards,
Hou zj

Attachment Content-Type Size
v72-0001-Allow-specifying-row-filters-for-logical-replication.patch application/octet-stream 156.1 KB
v72-0002-Row-filter-tab-auto-complete-and-pgdump.patch application/octet-stream 6.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-01-28 03:28:17 Re: Is there a way (except from server logs) to know the kind of on-going/last checkpoint?
Previous Message Julien Rouhaud 2022-01-28 03:24:13 Re: Is there a way (except from server logs) to know the kind of on-going/last checkpoint?