Re: walsender vs. XLogBackgroundFlush during shutdown

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: walsender vs. XLogBackgroundFlush during shutdown
Date: 2019-05-01 16:46:20
Message-ID: 20190501164620.4jjue75tswqdst3l@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 01, 2019 at 08:53:15AM -0700, Andres Freund wrote:
>Hi,
>
>On 2019-05-01 02:28:45 +0200, Tomas Vondra wrote:
>> The reason for that is pretty simple - the walsenders are doing logical
>> decoding, and during shutdown they're waiting for WalSndCaughtUp=true.
>> But per XLogSendLogical() this only happens if
>>
>> if (logical_decoding_ctx->reader->EndRecPtr >= GetFlushRecPtr())
>> {
>> WalSndCaughtUp = true;
>> ...
>> }
>>
>> That is, we need to get beyong GetFlushRecPtr(). But that may never
>> happen, because there may be incomplete (and unflushed) page in WAL
>> buffers, not forced out by any transaction. So if there's a WAL record
>> overflowing to the unflushed page, the walsender will never catch up.
>>
>> Now, this situation is apparently expected, because WalSndWaitForWal()
>> does this:
>>
>> /*
>> * If we're shutting down, trigger pending WAL to be written out,
>> * otherwise we'd possibly end up waiting for WAL that never gets
>> * written, because walwriter has shut down already.
>> */
>> if (got_STOPPING)
>> XLogBackgroundFlush();
>>
>> but unfortunately that does not actually do anything, because right at
>> the very beginning XLogBackgroundFlush() does this:
>>
>> /* back off to last completed page boundary */
>> WriteRqst.Write -= WriteRqst.Write % XLOG_BLCKSZ;
>
>> That is, it intentionally ignores the incomplete page, which means the
>> walsender can't read the record and reach GetFlushRecPtr().
>>
>> XLogBackgroundFlush() does this since (at least) 2007, so how come we
>> never had issues with this before?
>
>I assume that's because of the following logic:
> /* if we have already flushed that far, consider async commit records */
> if (WriteRqst.Write <= LogwrtResult.Flush)
> {
> SpinLockAcquire(&XLogCtl->info_lck);
> WriteRqst.Write = XLogCtl->asyncXactLSN;
> SpinLockRelease(&XLogCtl->info_lck);
> flexible = false; /* ensure it all gets written */
> }
>
>and various pieces of the code doing XLogSetAsyncXactLSN() to force
>flushing. I wonder if the issue is that we're better at avoiding
>unnecessary WAL to be written due to
>6ef2eba3f57f17960b7cd4958e18aa79e357de2f
>

I don't think so, because (a) there are no async commits involved, and (b)
we originally ran into the issue on 9.6 and 6ef2eba3f57f1 is only in 10+.

>
>> But I don't think we're safe without the failover slots patch, because
>> any output plugin can do something similar - say, LogLogicalMessage() or
>> something like that. I'm not aware of a plugin doing such things, but I
>> don't think it's illegal / prohibited either. (Of course, plugins that
>> generate WAL won't be useful for decoding on standby in the future.)
>
>FWIW, I'd consider such an output plugin outright broken.
>

Why? Is that prohibited somewhere, either explicitly or implicitly? I do
see obvious issues with generating WAL from plugin (infinite cycle and so
on), but I suppose that's more a regular coding issue than something that
would make all plugins doing that broken.

FWIW I don't see any particular need to generate WAL from output plugins,
I mentioned it mostly jsut as a convenient way to trigger the issue. I
suppose there are other ways to generate a bit of WAL during shutdown.

>
>> So what I think we should do is to tweak XLogBackgroundFlush() so that
>> during shutdown it skips the back-off to page boundary, and flushes even
>> the last piece of WAL. There are only two callers anyway, so something
>> like XLogBackgroundFlush(bool) would be simple enough.
>
>I think it then just ought to be a normal XLogFlush(). I.e. something
>along the lines of:
>
> /*
> * If we're shutting down, trigger pending WAL to be written out,
> * otherwise we'd possibly end up waiting for WAL that never gets
> * written, because walwriter has shut down already.
> */
> if (got_STOPPING && !RecoveryInProgress())
> XLogFlush(GetXLogInsertRecPtr());
>
> /* Update our idea of the currently flushed position. */
> if (!RecoveryInProgress())
> RecentFlushPtr = GetFlushRecPtr();
> else
> RecentFlushPtr = GetXLogReplayRecPtr(NULL);
>

Perhaps. That would work too, I guess.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-05-01 16:50:16 Re: pgsql: Compute XID horizon for page level index vacuum on primary.
Previous Message Robert Haas 2019-05-01 16:39:13 Re: New vacuum option to do only freezing