Checkpointer sync queue fills up / loops around pg_usleep() are bad

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Checkpointer sync queue fills up / loops around pg_usleep() are bad
Date: 2022-02-26 21:39:42
Message-ID: 20220226213942.nb7uvb2pamyu26dj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In two recent investigations in occasional test failures
(019_replslot_limit.pl failures, AIO rebase) the problems are somehow tied to
checkpointer.

I don't yet know if actually causally related to precisely those failures, but
when running e.g. 027_stream_regress.pl, I see phases in which many backends
are looping in RegisterSyncRequest() repeatedly, each time sleeping with
pg_usleep(10000L).

Without adding instrumentation this is completely invisible at any log
level. There's no log messages, there's no wait events, nothing.

ISTM, we should not have any loops around pg_usleep(). And shorter term, we
shouldn't have any loops around pg_usleep() that don't emit log messages / set
wait events. Therefore I propose that we "prohibit" such loops without at
least a DEBUG2 elog() or so. It's just too hard to debug.

The reason for the sync queue filling up in 027_stream_regress.pl is actually
fairly simple:

1) The test runs with shared_buffers = 1MB, leading to a small sync queue of
128 entries.
2) CheckpointWriteDelay() does pg_usleep(100000L)

ForwardSyncRequest() wakes up the checkpointer using SetLatch() if the sync
queue is more than half full.

But at least on linux and freebsd that doesn't actually interrupt pg_usleep()
anymore (due to using signalfd / kqueue rather than a signal handler). And on
all platforms the signal might arrive just before the pg_usleep() rather than
during, also not causing usleep to be interrupted.

If I shorten the sleep in CheckpointWriteDelay() the problem goes away. This
actually reduces the time for a single run of 027_stream_regress.pl on my
workstation noticably. With default sleep time it's ~32s, with shortened time
it's ~27s.

I suspect we need to do something about this concrete problem for 14 and
master, because it's certainly worse than before on linux / freebsd.

I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd
require adding a new enum value to WaitEventTimeout in 14. Which probably is
fine?

Greetings,

Andres Freund

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2022-02-26 22:03:04 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
Previous Message Greg Stark 2022-02-26 21:12:27 Re: Commitfest manager for 2022-03