Re: RecoveryInProgress() has critical side effects

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, amul sul <sulamul(at)gmail(dot)com>
Subject: Re: RecoveryInProgress() has critical side effects
Date: 2021-11-16 18:51:16
Message-ID: 20211116185116.g4qjbvgnurow3bio@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-11-11 12:15:03 -0500, Robert Haas wrote:
> The reason why RecoveryInProgress() has critical side effects is that
> it calls InitXLOGAccess(). It turns out that the only critical thing
> that InitXLOGAccess() does is call InitXLogInsert().[1] If a backend
> doesn't call InitXLogInsert(), max_rdatas won't be initialized, and
> the first attempt to insert WAL will probably fail with something like
> "ERROR: too much WAL data". But there's no real reason why
> InitXLogInsert() needs to happen only after recovery is finished.

I think we should consider allocating that memory in postmaster, on !windows
at least. Or, perhaps even better, to initially use some static variable, and
only use a separate memory context when XLogEnsureRecordSpace(). Reserving
all that memory in every process just needlessly increases our per-connection
memory usage, for no good reason.

> The
> work that it does is just setting up data structures in backend-local
> memory, and not much is lost if they are set up and not used. So, in
> 0001, I propose to move the call to InitXLogInsert() from
> InitXLOGAccess() to BaseInit(). That means every backend will do it
> the first time it starts up. It also means that CreateCheckPoint()
> will no longer need a special case call to InitXLogInsert(), which
> seems like a good thing.

Yes. Seems making BaseInit() being executed at a halfway consistent point is
starting to have some benefits at least ;)

> Nevertheless, what I did in 0002 is remove InitXLOGAccess() but move
> the initialization of RedoRecPtr and doPageWrites into
> GetFullPageWritesInfo(), only in the case where RedoRecPtr hasn't been
> set yet. This has one property that I really really like, which is
> that makes the code more understandable. There is no suggestion that
> the correctness of WAL insertion somehow depends on a
> RecoveryInProgress() call which may or may not have occurred at some
> arbitrarily distant time in the past. Initializing the value at the
> point of first use is just way clearer than initializing it as a
> side-effect of some other function that's not even that tightly
> connected. However, it does have the effect of introducing a branch
> into GetFullPageWritesInfo(), which I'm not 100% sure is cheap enough
> not to matter.

Hm. Compared to the other costs of the XLogInsert do/while loop it probably
doesn't matter. One nearly-always-false branch is vastly cheaper than going
through the loop unnecessarily. Sure, the unnecessarily iteration saved will
only be the first (for now, it might be worth to refresh the values more
often), but there's enough branches in the body of the loop that I can't
really see it mattering.

Maybe kinda conceivably the cost of that branch could be an argument if
GetFullPageWritesInfo() were inline in XLogInsert(). But it isn't.

> I think the obvious alternative is to just not
> initialize RedoRecPtr and doPageWrites at all and document in the
> comments that the first time each backend does something with them
> it's going to end up retrying once; perhaps that is preferable. Going
> the other way, we could get rid of the global variables altogether and
> have GetFullPageWrites() read the data from shared memory. However,
> unless 8-byte atomicity is guaranteed, that requires a spinlock, so it
> seems likely unappealing.

I think it'd be fine to just make them 8byte atomics, which'd be lock-free on
relevant platforms (at least once the aarch64 thing is fixed, there's a
patch).

XLogCtlInsert already takes care to ensure that RedoRecPtr/fullPageWrites are
on a cacheline that's not constantly modified.

If we really wanted to optimize the no-8-byte-single-copy-atomicity path, we
could introduce a counter counting how many times RedoRecPtr/fullPageWrites
have changed. But it seems unlikely to be worth it.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2021-11-16 18:57:15 Re: pgsql: Fix headerscheck failure in replication/worker_internal.h
Previous Message Joshua Brindle 2021-11-16 18:26:37 Re: Support for NSS as a libpq TLS backend