Re: Logical replication timeout problem

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: Logical replication timeout problem
Date: 2023-02-08 20:02:35
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2023-02-08 10:30:37 -0800, Andres Freund wrote:
> On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> > I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> > consider what happens if e.g. the origin filter filters out entire
> > transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
> > we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> > reliably.
> Is it actually the right thing to check SyncRepRequested() in that logic? It's
> quite common to set up syncrep so that individual users or transactions opt
> into syncrep, but to leave the default disabled.
> I don't really see an alternative to making this depend solely on
> sync_standbys_defined.

Hacking on a rough prototype how I think this should rather look, I had a few
questions / remarks:

- We probably need to call UpdateProgress from a bunch of places in decode.c
as well? Indicating that we're lagging by a lot, just because all
transactions were in another database seems decidedly suboptimal.

- Why should lag tracking only be updated at commit like points? That seems
like it adds odd discontinuinities?

- The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems
somewhat odd. They have very overlapping meanings IMO.

- there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there
should be? It's legit progress.

- That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite
confusing, naming-wise. IIUC "reading" is about receiving confirmation
messages, "writing" about the time the record was generated. ISTM that the
current time is a quite poor approximation in XLogSendPhysical(), but pretty
much meaningless in WalSndUpdateProgress()? Am I missing something?

- Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
WalSndWriteData() missing wal_sender_timeout <= 0 checks?

- I don't really understand why f95d53edged55 added !end_xact to the if
condition for ProcessPendingWrites(). Is the theory that we'll end up in an
outer loop soon?

Attached is a current, quite rough, prototype. It addresses some of the points
raised, but far from all. There's also several XXXs/FIXMEs in it. I changed
the file-ending to .txt to avoid hijacking the CF entry.


Andres Freund

Attachment Content-Type Size
v1-0001-WIP-Initial-sketch-of-progress-update-rework.txt text/plain 26.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Bagga, Rishu 2023-02-08 20:04:52 Re: SLRUs in the main buffer pool - Page Header definitions
Previous Message Peter Smith 2023-02-08 19:08:27 Re: Deadlock between logrep apply worker and tablesync worker