Re: logical replication empty transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication empty transactions
Date: 2020-03-04 05:19:54
Message-ID: CAA4eK1+Naj0+3wsroFnAAu+HLTQUo_oPCZFrsCKwoxs7PAWtPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
> > <euler(dot)taveira(at)2ndquadrant(dot)com> wrote:
> > >
> > > On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >>
> > >>
> > >> Another idea could be that we stream the transaction after some
> > >> threshold number (say 100 or anything we think is reasonable) of empty
> > >> xacts. This will reduce the traffic without tinkering with the core
> > >> design too much.
> > >>
> > >>
> > > Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
> > >
> >
> > Do you mean to say that if for some threshold interval we didn't
> > stream any transaction, then we can send the next empty transaction to
> > the subscriber? If so, then isn't it possible that the empty xacts
> > happen irregularly after the specified interval and then we still end
> > up sending them all. I might be missing something here, so can you
> > please explain your idea in detail? Basically, how will it work and
> > how will it solve the problem.
>
> IMHO, the threshold should be based on the commit LSN. Our main
> reason we want to send empty transactions after a certain
> transaction/duration is that we want the restart_lsn to be moving
> forward so that if we need to restart the replication slot we don't
> need to process a lot of extra WAL. So assume we set the threshold
> based on transaction count then there is still a possibility that we
> might process a few very big transactions then we will have to process
> them again after the restart.
>

Won't the subscriber eventually send the flush location for the large
transactions which will move the restart_lsn?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-03-04 05:22:44 Re: [PATCH] Add schema and table names to partition error
Previous Message Chris Bandy 2020-03-04 05:18:51 Re: [PATCH] Add schema and table names to partition error