Re: logical replication empty transactions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)timbira(dot)com(dot)br>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical replication empty transactions
Date: 2020-03-09 18:30:18
Message-ID: 20200309183018.tzkzwu635sd366ej@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-03-06 13:53:02 +0800, Craig Ringer wrote:
> On Mon, 2 Mar 2020 at 19:26, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > One thing that is not clear to me is how will we advance restart_lsn
> > if we don't send any empty xact in a system where there are many such
> > xacts?
>
> Same way we already do it for writes that are not replicated over
> logical replication, like vacuum work etc. The upstream sends feedback
> with reply-requested. The downstream replies. The upstream advances
> confirmed_flush_lsn, and that lazily updates restart_lsn.

It'll still delay it a bit.

> The bigger issue here is that if you don't send empty txns on logical
> replication you don't get an eager, timely response from the
> replica(s), which delays synchronous replication. You need to send
> empty txns when synchronous replication is enabled, or instead poke
> the walsender to force immediate feedback with reply requested.

Somewhat independent from the issue at hand: It'd be really good if we
could evolve the syncrep framework to support per-database waiting... It
shouldn't be that hard, and the current situation sucks quite a bit (and
yes, I'm to blame).

I'm not quite sure what you mean by "poke the walsender"? Kinda sounds
like sending a signal, but decoding happens inside after the walsender,
so there's no need for that. Do you just mean somehow requesting that
walsender sends a feedback message?

To address the volume we could:

1a) Introduce a pgoutput message type to indicate that the LSN has
advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is
21 bytes, COMMIT is 26. But we really don't need that much here. A
single message should do the trick.

1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and
possibly rename) that indicates that we are intentionally "ignoring"
WAL. For walsender that callback then could check if it could just
forward the position of the client (if it was entirely caught up
before), or if it should send a feedback request (if syncrep is
enabled, or distance is big).

2) Reduce the rate of 'empty transaction'/feedback request messages. If
we know that we're not going to be blocked waiting for more WAL, or
blocked sending messages out to the network, we don't immediately need
to send out the messages. Instead we could continue decoding until
there's actual data, or until we're going to get blocked.

We could e.g. have a new LogicalDecodingContext callback that is
called whenever WalSndWaitForWal() would wait. That'd check if there's
a pending "need" to send out a 'empty transaction'/feedback request
message. The "need" flag would get cleared whenever we send out data
bearing an LSN for other reasons.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-03-09 18:36:21 Re: Improve search for missing parent downlinks in amcheck
Previous Message Tomas Vondra 2020-03-09 18:19:15 Re: Additional improvements to extended statistics