Unduly short fuse in RequestCheckpoint

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Unduly short fuse in RequestCheckpoint
Date: 2019-03-16 16:07:55
Message-ID: 27830.1552752475@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I noticed an odd buildfarm failure today:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2019-03-16%2012%3A12%3A20

of which the key bit seems to be

2019-03-16 15:20:43.835 UTC [10879304] 003_promote.pl LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base backup' NOWAIT
2019-03-16 15:20:45.857 UTC [10879304] 003_promote.pl ERROR: could not request checkpoint because checkpointer not running
2019-03-16 15:20:47.227 UTC [61604144] LOG: received immediate shutdown request

Digging in the buildfarm archives finds seven other occurrences of the
same error in the past three months (I didn't look back further).

The cause of this error is that RequestCheckpoint will give up and fail
after just 2 seconds, which evidently is not long enough on slow or
heavily loaded machines. Since there isn't any good reason why the
checkpointer wouldn't be running, I'm inclined to swing a large hammer
and kick this timeout up to 60 seconds. Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2019-03-16 16:14:20 Re: Index Skip Scan
Previous Message Tomas Vondra 2019-03-16 15:54:01 Re: [HACKERS] PATCH: multivariate histograms and MCV lists