Checkpointer sync queue fills up / loops around pg_usleep() are bad

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Checkpointer sync queue fills up / loops around pg_usleep() are bad
Date: 2022-02-26 21:39:42
Lists: pgsql-hackers


In two recent investigations in occasional test failures
( failures, AIO rebase) the problems are somehow tied to

I don't yet know if actually causally related to precisely those failures, but
when running e.g., I see phases in which many backends
are looping in RegisterSyncRequest() repeatedly, each time sleeping with

Without adding instrumentation this is completely invisible at any log
level. There's no log messages, there's no wait events, nothing.

ISTM, we should not have any loops around pg_usleep(). And shorter term, we
shouldn't have any loops around pg_usleep() that don't emit log messages / set
wait events. Therefore I propose that we "prohibit" such loops without at
least a DEBUG2 elog() or so. It's just too hard to debug.

The reason for the sync queue filling up in is actually
fairly simple:

1) The test runs with shared_buffers = 1MB, leading to a small sync queue of
128 entries.
2) CheckpointWriteDelay() does pg_usleep(100000L)

ForwardSyncRequest() wakes up the checkpointer using SetLatch() if the sync
queue is more than half full.

But at least on linux and freebsd that doesn't actually interrupt pg_usleep()
anymore (due to using signalfd / kqueue rather than a signal handler). And on
all platforms the signal might arrive just before the pg_usleep() rather than
during, also not causing usleep to be interrupted.

If I shorten the sleep in CheckpointWriteDelay() the problem goes away. This
actually reduces the time for a single run of on my
workstation noticably. With default sleep time it's ~32s, with shortened time
it's ~27s.

I suspect we need to do something about this concrete problem for 14 and
master, because it's certainly worse than before on linux / freebsd.

I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd
require adding a new enum value to WaitEventTimeout in 14. Which probably is


Andres Freund


