Re: pg_stop_backup does not complete

From: David Fetter <david(at)fetter(dot)org>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_stop_backup does not complete
Date: 2010-02-23 19:38:58
Message-ID: 20100223193858.GA2917@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 23, 2010 at 06:58:22PM +0000, Simon Riggs wrote:
> On Tue, 2010-02-23 at 09:45 -0800, Josh Berkus wrote:
>
> > 1) Set up a brand new master with an archive-commmand and
> > archive=on.
> >
> > 2) Start the master
> >
> > 3) Do a pg_start_backup()
> >
> > 4) Realize, based on log error messages, that I've misconfigured
> > the archive_command.
>
> > 5) Attempt to shut down the master. Master tells me that
> > pg_stop_backup must be run in order to shut down.
> >
> > 6) Execute pg_stop_backup.
> >
> > 7) pg_stop_backup waits forever without ever stopping backup.
> > Ever 60 seconds, it give me a helpful "still waiting" message, but
> > at least in the amount of time I was willing to wait (5 minutes),
> > it never completed.
> >
> > 8) do an immediate shutdown, as it's the only way I can get the
> > database unstuck.
> >
> > With some experimentation, the problem seems to occur when you
> > have a failing archive_command and a master which currently has no
> > database traffic; for example, if I did some database write
> > activity (a createdb) then pg_stop_backup would complete after
> > about 60 seconds (which, btw, is extremely annoying, but at least
> > tolerable).
> >
> > This issue is 100% reproduceable.
>
> IMHO there in no problem in that behaviour. If somebody requests a
> backup then we should wait for it to complete. Kevin's suggestion of
> pg_fail_backup() is the only sensible conclusion there because it
> gives an explicit way out of deadlock.
>
> ISTM the problem is that you didn't test. Steps 3 and 4 should have
> been reversed. Perhaps we should put something in the docs to say
> "and test". The correct resolution is to put in an archive_command
> that works.

+1 for clarifying and extending the docs.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-02-23 19:39:25 Re: pg_stop_backup does not complete
Previous Message Kevin Grittner 2010-02-23 19:25:33 Re: pg_stop_backup does not complete