Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
Date: 2022-03-16 03:04:30
Message-ID: CA+hUKGJf4iF2tnLfN5zWYPjBNkRSNsQOmmTrDpYw-cBrSyrGfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 2, 2022 at 10:58 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2022-03-02 06:46:23 +1300, Thomas Munro wrote:
> > From a9344bb2fb2a363bec4be526f87560cb212ca10b Mon Sep 17 00:00:00 2001
> > From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> > Date: Mon, 28 Feb 2022 11:27:05 +1300
> > Subject: [PATCH v2 1/3] Wake up for latches in CheckpointWriteDelay().
>
> LGTM. Would be nice to have this fixed soon, even if it's just to reduce test
> times :)

Thanks for the review. Pushed to master and 14, with the wait event
moved to the end of the enum for the back-patch.

> > From 1eb0266fed7ccb63a2430e4fbbaef2300f2aa0d0 Mon Sep 17 00:00:00 2001
> > From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> > Date: Tue, 1 Mar 2022 11:38:27 +1300
> > Subject: [PATCH v2 2/3] Fix waiting in RegisterSyncRequest().
>
> LGTM.

Pushed as far back as 12. It could be done for 10 & 11, but indeed
the code starts getting quite different back there, and since there
are no field reports, I think that's OK for now.

A simple repro, for the record: run installcheck with
shared_buffers=256kB, and then partway through, kill -STOP
$checkpointer to simulate being stalled on IO for a while. Backends
will soon start waiting for the checkpointer to drain the queue while
dropping relations. This state was invisible to pg_stat_activity, and
hangs forever if you kill the postmaster and CONT the checkpointer.

> > From 50060e5a0ed66762680ddee9e30acbad905c6e98 Mon Sep 17 00:00:00 2001
> > From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
> > Date: Tue, 1 Mar 2022 17:34:43 +1300
> > Subject: [PATCH v2 3/3] Use condition variable to wait when sync request queue
> > is full.

> [review]

I'll come back to 0003 (condition variable-based improvement) a bit later.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-03-16 03:07:47 Re: BufferAlloc: don't take two simultaneous locks
Previous Message wangw.fnst@fujitsu.com 2022-03-16 02:57:11 RE: Logical replication timeout problem