Re: Race conditions with checkpointer and shutdown

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race conditions with checkpointer and shutdown
Date: 2019-04-18 14:53:00
Message-ID: 1080.1555599180@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Wed, Apr 17, 2019 at 10:45 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think what we need to look for is reasons why (1) the postmaster
>> never sends SIGUSR2 to the checkpointer, or (2) the checkpointer's
>> main loop doesn't get to noticing shutdown_requested.
>>
>> A rather scary point for (2) is that said main loop seems to be
>> assuming that MyLatch a/k/a MyProc->procLatch is not used for any
>> other purposes in the checkpointer process. If there were something,
>> like say a condition variable wait, that would reset MyLatch at any
>> time during a checkpoint, then we could very easily go to sleep at the
>> bottom of the loop and not notice that there's a pending shutdown request.

> Agreed on the non-composability of that coding, but if there actually
> is anything in that loop that can reach ResetLatch(), it's well
> hidden...

Well, it's easy to see that there's no other ResetLatch call in
checkpointer.c. It's much less obvious that there's no such
call anywhere in the code reachable from e.g. CreateCheckPoint().

To try to investigate that, I hacked things up to force an assertion
failure if ResetLatch was called from any other place in the
checkpointer process (dirty patch attached for amusement's sake).
This gets through check-world without any assertions. That does not
really prove that there aren't corner timing cases where a latch
wait and reset could happen, but it does put a big dent in my theory.
Question is, what other theory has anybody got?

regards, tom lane

Attachment Content-Type Size
assert-there-is-only-one.patch text/x-diff 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2019-04-18 15:32:57 Re: block-level incremental backup
Previous Message Tom Lane 2019-04-18 14:09:53 Re: Question about the holdable cursor