Mac OS X: system shutdown prevents checkpoint

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Francois Suter <dba(at)paragraf(dot)ch>
Subject: Mac OS X: system shutdown prevents checkpoint
Date: 2002-04-30 05:26:26
Message-ID: 17395.1020144386@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I've been looking into Francois Suter's recent reports of Postgres not
shutting down cleanly on Mac OS X 10.1. I find that it's quite
reproducible. If you tell the system to shut down in the normal
fashion (eg, pick "Shut Down" from the Apple menu), the postmaster
does not terminate, leading to WAL recovery upon restart --- or
even worse, failure to restart if the postmaster PID recorded in the
lockfile happens to get assigned to some other daemon.

Observe the normal trace of postmaster shutdown (running with -d4,
logging of timestamps and PIDs enabled):

2002-04-30 00:08:30 [315] DEBUG: pmdie 15
2002-04-30 00:08:30 [315] DEBUG: smart shutdown request
2002-04-30 00:08:30 [331] DEBUG: shutting down
2002-04-30 00:08:32 [331] DEBUG: database system is shut down
2002-04-30 00:08:32 [331] DEBUG: proc_exit(0)
2002-04-30 00:08:32 [331] DEBUG: shmem_exit(0)
2002-04-30 00:08:32 [331] DEBUG: exit(0)
2002-04-30 00:08:32 [315] DEBUG: reaping dead processes
2002-04-30 00:08:32 [315] DEBUG: proc_exit(0)
2002-04-30 00:08:32 [315] DEBUG: shmem_exit(0)
2002-04-30 00:08:32 [315] DEBUG: exit(0)

The postmaster (here PID 315) forks a subprocess to flush shared buffers
and checkpoint the WAL log. When the subprocess exits, the postmaster
removes its lockfile and shuts down. The subprocess takes a minimum of
2 seconds because there's a sleep(2) in the checkpoint fsync code.

Now here's what I see in the case of shutting down the OS X system:

2002-04-30 00:25:35 [376] DEBUG: pmdie 15
2002-04-30 00:25:35 [376] DEBUG: smart shutdown request

... and nothing more. Actual system shutdown (power down) occurred at
approximately 00:26:06 by my watch, over thirty seconds later than the
postmaster received SIGTERM. So there was plenty of time to do the
checkpoint subprocess. (Indeed, I believe that thirty seconds is the
grace period Darwin's init process allows SIGTERM'd processes before
giving up and hard-killing them. So the system was actually sitting and
waiting for the postmaster.)

What we appear to have here is that the kernel is not allowing the
postmaster to fork a checkpoint subprocess. But there's no indication
that the postmaster got a fork() error return, either. Seems like it's
just hung.

Does this ring a bell with anyone? Is it an OSX bug, or a "feature";
and if the latter, how can we work around it?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Christopher Kings-Lynne 2002-04-30 06:30:19 Re: Mac OS X: system shutdown prevents checkpoint
Previous Message pgsql-gen Newsgroup (@Basebeans.com) 2002-04-30 04:10:01 Re: What popular, large commercial websites run

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2002-04-30 06:30:19 Re: Mac OS X: system shutdown prevents checkpoint
Previous Message Tom Lane 2002-04-30 05:09:31 Re: Civility of core/hackers group