Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2019-12-24 08:21:12
Message-ID: CAA4eK1L-KYycdTYanqo3nDzw=XWvADOuerHtbBSnBiRejmE3Qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 24, 2019 at 11:17 AM Masahiko Sawada
<masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>
> On Fri, 20 Dec 2019 at 22:30, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > The main aim of this feature is to reduce apply lag. Because if we
> > send all the changes together it can delay there apply because of
> > network delay, whereas if most of the changes are already sent, then
> > we will save the effort on sending the entire data at commit time.
> > This in itself gives us decent benefits. Sure, we can further improve
> > it by having separate workers (dedicated to apply the changes) as you
> > are suggesting and in fact, there is a patch for that as well(see the
> > performance results and bgworker patch at [1]), but if try to shove in
> > all the things in one go, then it will be difficult to get this patch
> > committed (there are already enough things and the patch is quite big
> > that to get it right takes a lot of energy). So, the plan is
> > something like that first we get the basic feature and then try to
> > improve by having dedicated workers or things like that. Does this
> > make sense to you?
> >
>
> Thank you for explanation. The plan makes sense. But I think in the
> current design it's a problem that logical replication worker doesn't
> receive changes (and doesn't check interrupts) during applying
> committed changes even if we don't have a worker dedicated for
> applying. I think the worker should continue to receive changes and
> save them to temporary files even during applying changes.
>

Won't it beat the purpose of this feature which is to reduce the apply
lag? Basically, it can so happen that while applying commit, it
constantly gets changes of other transactions which will delay the
apply of the current transaction. Also, won't it create some further
work to identify the order of commits? Say while applying commit-1,
it receives 5 other commits that are written to separate temporary
files. How will we later identify which transaction's WAL we need to
apply first? We might deduce by LSN's, but I think that could be
tricky. Another thing is that I think it could lead to some design
complications as well because while applying commit, you need some
sort of callback or something like that to receive and flush totally
unrelated changes. It could lead to another kind of failure mode
wherein while applying commit if it tries to receive another
transaction data and some failure happens while writing the data of
that transaction. I am not sure if it is a good idea to try something
like that.

> Otherwise
> the buffer would be easily full and replication gets stuck.
>

Are you telling about network buffer? I think the best way as
discussed is to launch new workers for streamed transactions, but we
can do that as an additional feature. Anyway, as proposed, users can
choose the streaming mode for subscriptions, so there is an option to
turn this selectively.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2019-12-24 08:32:23 Re: Should we rename amapi.h and amapi.c?
Previous Message Konstantin Knizhnik 2019-12-24 08:15:40 Re: Columns correlation and adaptive query optimization