Re: row filtering for logical replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-09-21 05:45:56
Message-ID: CAFiTN-vauBL5fWZRSO0XPcO4ATT2z4epemEDKmeAHXdTorUMtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 21, 2021 at 10:41 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>

> If you have only a and c in the old tuple, how will it evaluate
> expression c + d?

Well, what I told is that if we have such dependency then we will have
to copy that field to the old tuple, e.g. if we convert the filter for
the old tuple from (a > 10 and b < 20 and c+d = 20) to (a > 10 and
c+d=20), then we will not have to copy 'b' to the old tuple but we
still have to copy 'd' because there is a dependency.

I think the point is if for some expression some
> values are in old tuple and others are in new then the idea proposed
> in the patch seems sane. Moreover, I think in your idea for each tuple
> we might need to build a new expression and sometimes twice that will
> beat the purpose of cache we have kept in the patch and I am not sure
> if it is less costly.

Basically, expression initialization should happen only once in most
cases so with my suggestion you might have to do it twice. But the
overhead of extra expression evaluation is far less than doing
duplicate evaluation because that will happen for sending each update
operation right?

> See another example where splitting filter might not give desired results:
>
> Say filter expression: (a = 10 and b = 20 and c = 30)
>
> Now, old_tuple has values for columns a and c and say values are 10
> and 30. So, the old_tuple will match the filter if we split it as per
> your suggestion. Now say new_tuple has values (a = 5, b = 15, c = 25).
> In such a situation dividing the filter will give us the result that
> the old_tuple is matching but new tuple is not matching which seems
> incorrect. I think dividing filter conditions among old and new tuples
> might not retain its sanctity.

Yeah that is a good example to apply a duplicate filter, basically
some filters might not even get evaluated on new tuples as the above
example and if we have removed such expression on the other tuple we
might break something. Maybe for now this suggest that we might not
be able to avoid the duplicate execution of the expression

> > >
> > > Even if it were done, there would still be the overhead of deforming the tuple.
> >
> > Suppose filter is just (a > 10 and b < 20) and only if the a is
> > updated, and if we are able to modify the filter for the oldtuple to
> > be just (a>10) then also do we need to deform?
> >
>
> Without deforming, how will you determine which columns are part of
> the old tuple?

Okay, then we might have to deform, but at least are we ensuring that
once we have deform the tuple for the expression evaluation then we
are not doing that again while sending the tuple?

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message kuroda.hayato@fujitsu.com 2021-09-21 06:08:29 RE: Allow escape in application_name
Previous Message Amit Kapila 2021-09-21 05:14:37 Re: relation OID in ReorderBufferToastReplace error message