Race conditions with checkpointer and shutdown

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Race conditions with checkpointer and shutdown
Date: 2019-04-16 07:01:19
Message-ID: 20190416070119.GK2673@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

This is a continuation of the following thread, but I prefer spawning
a new thread for clarity:
https://www.postgresql.org/message-id/20190416064512.GJ2673@paquier.xyz

The buildfarm has reported two similar failures when shutting down a
node:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet&dt=2019-03-23%2022%3A28%3A59
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2019-04-16%2006%3A14%3A01

In both cases, the instance cannot shut down because it times out,
waiting for the shutdown checkpoint to finish but I suspect that this
checkpoint actually never happens.

The first case involves piculet which has --disable-atomics, gcc 6 and
the recovery test 016_min_consistency where we trigger a checkpoint,
then issue a fast shutdown on a standby. And at this point the test
waits forever.

The second case involves dragonet which has JIT enabled and clang.
The failure is on test 009_twophase.pl. The failure happens after
test preparing transaction xact_009_11, where a *standby* gets
restarted. Again, the test waits forever for the instance to shut
down.

The most recent commits which have touched checkpoints are 0dfe3d0e
and c6c9474a, which maps roughly to the point where the failures
began to happen, and that something related to standby clean shutdowns
has broken since.

Thanks,
--
Michael

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-04-16 07:14:48 Re: Commit message / hash in commitfest page.
Previous Message Peter Eisentraut 2019-04-16 06:57:59 Re: [PATCH v20] GSSAPI encryption support