|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>|
|Cc:||Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Fabrice Chapuis <fabrice636861(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>|
|Subject:||Re: Logical replication timeout problem|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2023-02-08 10:30:37 -0800, Andres Freund wrote:
> On 2023-02-08 10:18:41 -0800, Andres Freund wrote:
> > I don't think the syncrep logic in WalSndUpdateProgress really works as-is -
> > consider what happens if e.g. the origin filter filters out entire
> > transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases
> > we'll be lucky because we'll return quickly to XLogSendLogical(), but not
> > reliably.
> Is it actually the right thing to check SyncRepRequested() in that logic? It's
> quite common to set up syncrep so that individual users or transactions opt
> into syncrep, but to leave the default disabled.
> I don't really see an alternative to making this depend solely on
Hacking on a rough prototype how I think this should rather look, I had a few
questions / remarks:
- We probably need to call UpdateProgress from a bunch of places in decode.c
as well? Indicating that we're lagging by a lot, just because all
transactions were in another database seems decidedly suboptimal.
- Why should lag tracking only be updated at commit like points? That seems
like it adds odd discontinuinities?
- The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems
somewhat odd. They have very overlapping meanings IMO.
- there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there
should be? It's legit progress.
- That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite
confusing, naming-wise. IIUC "reading" is about receiving confirmation
messages, "writing" about the time the record was generated. ISTM that the
current time is a quite poor approximation in XLogSendPhysical(), but pretty
much meaningless in WalSndUpdateProgress()? Am I missing something?
- Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(),
WalSndWriteData() missing wal_sender_timeout <= 0 checks?
- I don't really understand why f95d53edged55 added !end_xact to the if
condition for ProcessPendingWrites(). Is the theory that we'll end up in an
outer loop soon?
Attached is a current, quite rough, prototype. It addresses some of the points
raised, but far from all. There's also several XXXs/FIXMEs in it. I changed
the file-ending to .txt to avoid hijacking the CF entry.
|Next Message||Bagga, Rishu||2023-02-08 20:04:52||Re: SLRUs in the main buffer pool - Page Header definitions|
|Previous Message||Peter Smith||2023-02-08 19:08:27||Re: Deadlock between logrep apply worker and tablesync worker|