Re: sync request forward function ForwardSyncRequest() might hang for some time in a corner case?

From: Paul Guo <paulguo(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sync request forward function ForwardSyncRequest() might hang for some time in a corner case?
Date: 2021-05-27 13:59:10
Message-ID: CABQrizf0Ji-VnLGokbfGSxLM2R+j0d4kShAh+tVw1r-1M-MY-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 27, 2021 at 7:11 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Tue, May 25, 2021 at 4:39 PM Paul Guo <paulguo(at)gmail(dot)com> wrote:
> >
> > Hi hackers,
> >
> > I found this when reading the related code. Here is the scenario:
> >
> > bool
> > RegisterSyncRequest(const FileTag *ftag, SyncRequestType type,
> > bool retryOnError)
> >
> > For the case retryOnError is true, the function would in loop call
> > ForwardSyncRequest() until it succeeds, but in ForwardSyncRequest(),
> > we can see if we run into the below branch, RegisterSyncRequest() will
> > need to loop until the checkpointer absorbs the existing requests so
> > ForwardSyncRequest() might hang for some time until a checkpoint
> > request is triggered. This scenario seems to be possible in theory
> > though the chance is not high.
>
> It seems like a really unlikely scenario, but maybe possible if you
> use a lot of unlogged tables maybe (as you could eventually
> dirty/evict more than NBuffers buffers without triggering enough WALs
> activity to trigger a checkpoint with any sane checkpoint
> configuration).

RegisterSyncRequest() handles SYNC_UNLINK_REQUEST and
SYNC_FORGET_REQUEST scenarios, besides the usual SYNC_REQUEST type for
buffer sync.

> > ForwardSyncRequest():
> >
> > if (CheckpointerShmem->checkpointer_pid == 0 ||
> > (CheckpointerShmem->num_requests >= CheckpointerShmem->max_requests &&
> > !CompactCheckpointerRequestQueue()))
> > {
> > /*
> > * Count the subset of writes where backends have to do their own
> > * fsync
> > */
> > if (!AmBackgroundWriterProcess())
> > CheckpointerShmem->num_backend_fsync++;
> > LWLockRelease(CheckpointerCommLock);
> > return false;
> > }
> >
> > One fix is to add below similar code in RegisterSyncRequest(), trigger
> > a checkpoint for the scenario.
> >
> > // checkpointer_triggered: variable for one trigger only.
> > if (!ret && retryOnError && ProcGlobal->checkpointerLatch &&
> > !checkpointer_triggered)
> > SetLatch(ProcGlobal->checkpointerLatch);
> >
> > Any comments?
>
> It looks like you intended to set the checkpointer_triggered var but

Yes this is just pseduo code.

> didn't. Also this will wake up the checkpointer but won't force a
> checkpoint (unlike RequestCheckpoint()). It may be a good thing

I do not expect an immediate checkpoint. AbsorbSyncRequests()
is enough since after that RegisterSyncRequest() could finish.

> though as it would only absorb the requests and go back to sleep if no
> other threshold is reachrf. Apart from the implementation details it
> seems like it could help in this unlikely event.

--
Paul Guo (Vmware)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Guo 2021-05-27 14:04:53 Re: sync request forward function ForwardSyncRequest() might hang for some time in a corner case?
Previous Message Paul Guo 2021-05-27 13:50:28 Re: pg_rewind fails if there is a read only file.