Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-09-24 06:34:17
Message-ID: CAA4eK1+keaZY_pejjt+6OWn-myhyCS=MtJ9o2qQxKrL1_1zXzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 24, 2021 at 11:52 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, Sep 24, 2021 at 11:06 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Fri, Sep 24, 2021 at 10:50 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > > 12) misuse of REPLICA IDENTITY
> > > >
> > > > The more I think about this, the more I think we're actually misusing
> > > > REPLICA IDENTITY for something entirely different. The whole purpose of
> > > > RI was to provide a row identifier for the subscriber.
> > > >
> > > > But now we're using it to ensure we have all the necessary columns,
> > > > which is entirely orthogonal to the original purpose. I predict this
> > > > will have rather negative consequences.
> > > >
> > > > People will either switch everything to REPLICA IDENTITY FULL, or create
> > > > bogus unique indexes with extra columns. Which is really silly, because
> > > > it wastes network bandwidth (transfers more data) or local resources
> > > > (CPU and disk space to maintain extra indexes).
> > > >
> > > > IMHO this needs more infrastructure to request extra columns to decode
> > > > (e.g. for the filter expression), and then remove them before sending
> > > > the data to the subscriber.
> > > >
> > >
> > > Yeah, but that would have an additional load on write operations and I
> > > am not sure at this stage but maybe there could be other ways to
> > > extend the current infrastructure wherein we build the snapshots using
> > > which we can access the user tables instead of only catalog tables.
> > > Such enhancements if feasible would be useful not only for allowing
> > > additional column access in row filters but for other purposes like
> > > allowing access to functions that access user tables. I feel we can
> > > extend this later as well seeing the usage and requests. For the first
> > > version, this doesn't sound too limiting to me.
> >
> > I agree with one point from Tomas, that if we bind the row filter with
> > the RI, then if the user has to use the row filter on any column 1)
> > they have to add an unnecessary column to the index 2) Since they have
> > to add it to RI so now we will have to send it over the network as
> > well. 3). We anyway have to WAL log it if it is modified because now
> > we forced users to add some columns to RI because they wanted to use
> > the row filter on that. Now suppose we remove that limitation and we
> > somehow make these changes orthogonal to RI, i.e. if we have a row
> > filter on some column then we WAL log it, so now the only extra cost
> > we are paying is to just WAL log that column, but the user is not
> > forced to add it to index, not forced to send it over the network.
> >
>
> I am not suggesting adding additional columns to RI just for using
> filter expressions. If most users that intend to publish delete/update
> wanted to use filter conditions apart from replica identity then we
> can later extend this functionality but not sure if the only way to
> accomplish that is to log additional data in WAL.
>

One possibility in this regard could be that we enhance Replica
Identity .. Include (column_list) where all the columns in the include
list won't be sent but I think it is better to postpone such
enhancements for a later version. Like, I suggested above, we might
want to extend our infrastructure in a way where not only this extra
columns request can be accomplished but we should be able to allow
UDF's (where user tables can be accessed) and probably sub-queries as
well.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-09-24 06:49:15 Re: row filtering for logical replication
Previous Message Amit Langote 2021-09-24 06:34:03 a comment in joinrel.c: compute_partition_bounds()