Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-09-24 06:22:33
Message-ID: CAA4eK1JJAv+ww6UmruEgm3wWc7YESJHgz25D+PK30T5JFAYSkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > > 12) misuse of REPLICA IDENTITY
> > >
> > > The more I think about this, the more I think we're actually misusing
> > > REPLICA IDENTITY for something entirely different. The whole purpose of
> > > RI was to provide a row identifier for the subscriber.
> > >
> > > But now we're using it to ensure we have all the necessary columns,
> > > which is entirely orthogonal to the original purpose. I predict this
> > > will have rather negative consequences.
> > >
> > > People will either switch everything to REPLICA IDENTITY FULL, or create
> > > bogus unique indexes with extra columns. Which is really silly, because
> > > it wastes network bandwidth (transfers more data) or local resources
> > > (CPU and disk space to maintain extra indexes).
> > >
> > > IMHO this needs more infrastructure to request extra columns to decode
> > > (e.g. for the filter expression), and then remove them before sending
> > > the data to the subscriber.
> > >
> >
> > Yeah, but that would have an additional load on write operations and I
> > am not sure at this stage but maybe there could be other ways to
> > extend the current infrastructure wherein we build the snapshots using
> > which we can access the user tables instead of only catalog tables.
> > Such enhancements if feasible would be useful not only for allowing
> > additional column access in row filters but for other purposes like
> > allowing access to functions that access user tables. I feel we can
> > extend this later as well seeing the usage and requests. For the first
> > version, this doesn't sound too limiting to me.
>
> I agree with one point from Tomas, that if we bind the row filter with
> the RI, then if the user has to use the row filter on any column 1)
> they have to add an unnecessary column to the index 2) Since they have
> to add it to RI so now we will have to send it over the network as
> well. 3). We anyway have to WAL log it if it is modified because now
> we forced users to add some columns to RI because they wanted to use
> the row filter on that. Now suppose we remove that limitation and we
> somehow make these changes orthogonal to RI, i.e. if we have a row
> filter on some column then we WAL log it, so now the only extra cost
> we are paying is to just WAL log that column, but the user is not
> forced to add it to index, not forced to send it over the network.
>

I am not suggesting adding additional columns to RI just for using
filter expressions. If most users that intend to publish delete/update
wanted to use filter conditions apart from replica identity then we
can later extend this functionality but not sure if the only way to
accomplish that is to log additional data in WAL. I am just trying to
see if we can provide meaningful functionality without extending too
much the scope of this work.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-09-24 06:33:59 Re: Hook for extensible parsing.
Previous Message vignesh C 2021-09-24 06:16:53 Re: Added schema level support for publication.