Re: logical replication empty transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(dot)taveira(at)2ndquadrant(dot)com>
Cc: Euler Taveira <euler(at)timbira(dot)com(dot)br>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication empty transactions
Date: 2020-03-05 08:45:34
Message-ID: CAA4eK1LSw-LOuAmfk1W2TqQQ+tT=b8NWO6rVq24TqLj5ewVprA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 4, 2020 at 4:04 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > > > >
> > > > >
> > > > > IMHO, the threshold should be based on the commit LSN. Our main
> > > > > reason we want to send empty transactions after a certain
> > > > > transaction/duration is that we want the restart_lsn to be moving
> > > > > forward so that if we need to restart the replication slot we don't
> > > > > need to process a lot of extra WAL. So assume we set the threshold
> > > > > based on transaction count then there is still a possibility that we
> > > > > might process a few very big transactions then we will have to process
> > > > > them again after the restart.
> > > > >
> > > >
> > > > Won't the subscriber eventually send the flush location for the large
> > > > transactions which will move the restart_lsn?
> > >
> > > I meant large empty transactions (basically we can not send anything
> > > to the subscriber). So my point was if there are only large
> > > transactions in the system which we can not stream because those
> > > tables are not published. Then keeping threshold based on transaction
> > > count will not help much because even if we don't reach the
> > > transaction count threshold, we still might need to process a lot of
> > > data if we don't stream the commit for the empty transactions. So
> > > instead of tracking transaction count can we track LSN, and LSN
> > > different since we last stream some change cross the threshold then we
> > > will stream the next empty transaction.
> > >
> >
> > You have a point and it may be better to keep threshold based on LSN
> > if we want to keep any threshold, but keeping on transaction count
> > seems to be a bit straightforward. Let us see if anyone else has any
> > opinion on this matter?
>
> Ok, that make sense.
>

Euler, can we try to update the patch based on the number of
transactions threshold and see how it works?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-03-05 09:06:26 Re: Identifying user-created objects
Previous Message Kyotaro Horiguchi 2020-03-05 08:44:24 Re: Asynchronous Append on postgres_fdw nodes.