Re: logical replication empty transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication empty transactions
Date: 2020-03-04 05:45:52
Message-ID: CAFiTN-uxKFEV+AEobAUeY2AG=SQWkryt1C46FWutONNe0SiYcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira
> > > <euler(dot)taveira(at)2ndquadrant(dot)com> wrote:
> > > >
> > > > On Tue, 3 Mar 2020 at 05:24, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >>
> > > >>
> > > >> Another idea could be that we stream the transaction after some
> > > >> threshold number (say 100 or anything we think is reasonable) of empty
> > > >> xacts. This will reduce the traffic without tinkering with the core
> > > >> design too much.
> > > >>
> > > >>
> > > > Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch.
> > > >
> > >
> > > Do you mean to say that if for some threshold interval we didn't
> > > stream any transaction, then we can send the next empty transaction to
> > > the subscriber? If so, then isn't it possible that the empty xacts
> > > happen irregularly after the specified interval and then we still end
> > > up sending them all. I might be missing something here, so can you
> > > please explain your idea in detail? Basically, how will it work and
> > > how will it solve the problem.
> >
> > IMHO, the threshold should be based on the commit LSN. Our main
> > reason we want to send empty transactions after a certain
> > transaction/duration is that we want the restart_lsn to be moving
> > forward so that if we need to restart the replication slot we don't
> > need to process a lot of extra WAL. So assume we set the threshold
> > based on transaction count then there is still a possibility that we
> > might process a few very big transactions then we will have to process
> > them again after the restart.
> >
>
> Won't the subscriber eventually send the flush location for the large
> transactions which will move the restart_lsn?

I meant large empty transactions (basically we can not send anything
to the subscriber). So my point was if there are only large
transactions in the system which we can not stream because those
tables are not published. Then keeping threshold based on transaction
count will not help much because even if we don't reach the
transaction count threshold, we still might need to process a lot of
data if we don't stream the commit for the empty transactions. So
instead of tracking transaction count can we track LSN, and LSN
different since we last stream some change cross the threshold then we
will stream the next empty transaction.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-03-04 06:05:03 Re: Cast to uint16 in pg_checksum_page()
Previous Message John Naylor 2020-03-04 05:32:59 Re: Unicode escapes with any backend encoding