Re: Archiver behavior at shutdown

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Archiver behavior at shutdown
Date: 2007-12-29 00:04:44
Message-ID: 1198886684.9558.64.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Thu, 2007-12-27 at 18:54 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On Thu, 2007-12-27 at 17:29 -0500, Tom Lane wrote:
> >> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> >>> then a subsequent postmaster start could initiate a second archiver
> >>> process which would cause issues with whatever the first archiver is
> >>> doing.
> >>
> >> That's a problem that the archiver itself should fix (perhaps it needs
> >> its own lockfile).
>
> > http://archives.postgresql.org/pgsql-hackers/2006-05/msg00920.php
>
> I thought that sounded familiar ;-).

As you say, I'm beginning to know where the bodies are buried...

> What was the outcome of that
> discussion? No patch for this ever got applied AFAICS. The patch
> as posted had a few issues, per the thread, and I don't see a followup
> version. (The alleged replacement patch did something else entirely.)

We applied a one line change in preference to the lockfile approach for
8.2, requested by you, agreed to by me and applied by Bruce.

This would be the behaviour I would have, if I had a blank canvas:

- keep archiver alive at shutdown, rather than bouncing it

- send SIGUSR2 to do finish-up and close, just like bgwriter

- put a lockfile in for the archiver that prevents a new archiver from
starting, but everything else comes up OK. In postmaster if PgArchPID ==
0 then we check for archiver.pid, if present, read it and send a SIGUSR2
to it. If rc = ESRCH then process no present, so start up new archiver

- lets keep archiving, if there is work to do, right up until the last
possible moment, even if the postmaster has gone

- ensure people understand that an archive_command call can be
interrupted and may need to handle the consequences if the command is
not atomic

With those changes the use cases would look like this...

System Shutdown
System shuts down, postmaster shuts down, archiver works furiously until
the end trying to archive things away. Archiver gets caught half way
through copy, so crashes, leaving archiver.pid. Subsequent startup sees
archiver.pid, postmaster reads file to get pid, then sends signal to
archiver to see if it is still alive, it isn't so remove archiver.pid
and allow next archiver to start. First call to archive_command handles
partially copied file in archive.

Server Crash
Something takes down server, archiver stays up trying to archive things
away. Crash recovery kicks in and finishes very quickly, new archiver
tries to start up but cannot because first archiver is still working. At
the end of its cycle, first archiver goes away and allows new archiver
to start and continue operating.

Server Restart
Server shuts down, but there is work to do so first archiver stays
around to finish it. Newly started server tries to start archiver but
cannot because of pid file. Reads pid file, sends signal. Archiver is
already shutting down, so continues its cycle and then quites. New
archiver starts up under new postmaster.

...but that's too much change for me to personally stomach at this stage
of 8.3. My main issue is that I don't have the time to be able to do a
retest of start/stop/restart/crash behaviour and catching all the side
cases is fairly hard, and yet also critical at this stage of play.

For me, the behaviour is close enough now, with the main issue being the
additional wait at the end of pgarch_MainLoop(). It's been there since
8.2, so a simple fix there would be non-invasive and backpatchable also.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2007-12-29 01:20:38 Re: Archiver behavior at shutdown
Previous Message Tom Lane 2007-12-28 23:24:54 Re: minimal update

Browse pgsql-patches by date

  From Date Subject
Next Message Greg Smith 2007-12-29 01:20:38 Re: Archiver behavior at shutdown
Previous Message Andrew Dunstan 2007-12-28 17:14:42 Re: [HACKERS] Unworkable column delimiter characters for COPY