Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-08-26 03:19:56
Message-ID: CAA4eK1J1FNhDhj85Am1X_5VTxKOYo_nSBMuuMteB3n5YVPMdpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 26, 2021 at 7:37 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> ...
> >
> > Hmm, I think the gain via caching is not visible because we are using
> > simple expressions. It will be visible when we use somewhat complex
> > expressions where expression evaluation cost is significant.
> > Similarly, the impact of this change will magnify and it will also be
> > visible when a publication has many tables. Apart from performance,
> > this change is logically correct as well because it would be any way
> > better if we don't invalidate the cached expressions unless required.
>
> Please tell me what is your idea of a "complex" row filter expression.
> Do you just mean a filter that has multiple AND conditions in it? I
> don't really know if few complex expressions would amount to any
> significant evaluation costs, so I would like to run some timing tests
> with some real examples to see the results.
>

I think this means you didn't even understand or are convinced why the
patch has cache in the first place. As per your theory, even if we
didn't have cache, it won't matter but that is not true otherwise, the
patch wouldn't have it.

> On Wed, Aug 25, 2021 at 6:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Aug 25, 2021 at 10:57 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Aug 25, 2021 at 5:52 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
> > > >
> > > > On Tue, Aug 24, 2021, at 4:46 AM, Peter Smith wrote:
> >
> > > >
> > > > Anyway, I have implemented the suggested cache change because I agree
> > > > it is probably theoretically superior, even if in practice there is
> > > > almost no difference.
> > > >
> > > > I didn't inspect your patch carefully but it seems you add another List to
> > > > control this new cache mechanism. I don't like it. IMO if we can use the data
> > > > structures that we have now, let's implement your idea; otherwise, -1 for this
> > > > new micro optimization.
> > > >
> > >
> > > As mentioned above, without this we will invalidate many cached
> > > expressions even though it is not required. I don't deny that there
> > > might be a better way to achieve the same and if you or Peter have any
> > > ideas, I am all ears.
> > >
> >
> > I see that the new list is added to store row_filter node which we
> > later use to compute expression. This is not required for invalidation
> > but for delaying the expression evaluation till it is required (for
> > example, for truncate, we may not need the row evaluation, so there is
> > no need to compute it). Can we try to postpone the syscache lookup to
> > a later stage when we are actually doing row_filtering? If we can do
> > that, then I think we can avoid having this extra list?
>
> Yes, you are correct - that Node list was re-instated only because you
> had requested that the ExprState evaluation should be deferred until
> it is needed by the pgoutput_row_filter. Otherwise, the additional
> list would not be needed so everything would be much the same as in
> v23 except the invalidations would be more focussed on single tables.
>
> I don't think the syscache lookup can be easily postponed. That logic
> of get_rel_sync_entry processes the table filters of *all*
> publications, so moving that publications loop (including the
> partition logic) into the pgoutput_row_filter seems a bridge too far
> IMO.
>

Hmm, I don't think that is not true. You just need it for the relation
to be processed.

> Furthermore, I am not yet convinced that this ExprState postponement
> is very useful. It may be true that for truncate there is no need to
> compute it, but consider that the user would never even define a row
> filter in the first place unless they intended there will be some CRUD
> operations. So even if the truncate does not need the filter,
> *something* is surely going to need it.
>

Sure, but we don't need to add additional computation until it is required.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-08-26 03:24:48 Re: prevent immature WAL streaming
Previous Message Yugo NAGATA 2021-08-26 03:13:13 Re: Fix around conn_duration in pgbench