Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checkpointer sync queue fills up / loops around pg_usleep() are bad
Date: 2022-03-01 17:46:23
Message-ID: CA+hUKGLRtjhGWB-dd_B8z6agJaFmfxVTiSyqnka937ss3+VywQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 28, 2022 at 2:36 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On February 27, 2022 4:19:21 PM PST, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> >It seems a little strange to introduce a new wait event that will very
> >often appear into a stable branch, but ... it is actually telling the
> >truth, so there is that.
>
> In the back branches it needs to be at the end of the enum - I assume you intended that just to be for HEAD.

Yeah.

> I wonder whether in HEAD we shouldn't make that sleep duration be computed from the calculation in IsOnSchedule...

I might look into this.

> >The sleep/poll loop in RegisterSyncRequest() may also have another
> >problem. The comment explains that it was a deliberate choice not to
> >do CHECK_FOR_INTERRUPTS() here, which may be debatable, but I don't
> >think there's an excuse to ignore postmaster death in a loop that
> >presumably becomes infinite if the checkpointer exits. I guess we
> >could do:
> >
> >- pg_usleep(10000L);
> >+ WaitLatch(NULL, WL_EXIT_ON_PM_DEATH | WL_TIMEOUT, 10,
> >WAIT_EVENT_SYNC_REQUEST);
> >
> >But... really, this should be waiting on a condition variable that the
> >checkpointer broadcasts on when the queue goes from full to not full,
> >no? Perhaps for master only?
>
> Looks worth improving, but yes, I'd not do it in the back branches.

0003 is a first attempt at that, for master only (on top of 0002 which
is the minimal fix). This shaves another second off
027_stream_regress.pl on my workstation. The main thing I realised is
that I needed to hold interrupts while waiting, which seems like it
should go away with 'tombstone' files as discussed in other threads.
That's not a new problem in this patch, it just looks more offensive
to the eye when you spell it out, instead of hiding it with an
unreported sleep/poll loop...

> I do think it's worth giving that sleep a proper wait event though, even in the back branches.

I'm thinking that 0002 should be back-patched all the way, but 0001
could be limited to 14.

Attachment Content-Type Size
v2-0001-Wake-up-for-latches-in-CheckpointWriteDelay.patch text/x-patch 3.8 KB
v2-0002-Fix-waiting-in-RegisterSyncRequest.patch text/x-patch 3.3 KB
v2-0003-Use-condition-variable-to-wait-when-sync-request-.patch text/x-patch 10.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nitin Jadhav 2022-03-01 18:12:02 Refactor statistics collector, backend status reporting and command progress reporting
Previous Message Bharath Rupireddy 2022-03-01 17:39:57 Re: Allow async standbys wait for sync replication