Skip site navigation (1) Skip section navigation (2)

Re: Endless recovery

From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Hans-Juergen Schoenig" <postgres(at)cybertec(dot)at>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Endless recovery
Date: 2008-02-11 09:26:08
Message-ID: 47B014B0.3010400@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-patches
Hans-Juergen Schoenig wrote:
> Last week we have seen a problem with some horribly configured machine.
> The disk filled up (bad FSM ;) ) and once this happened the sysadmi killed the 
> system (-9).
> After two days PostgreSQL has still not started up and they tried to restart it 
> again and again making sure that the consistency check was started over an over 
> again (thus causing more and more downtime).
>  From the admi point of view there was no way to find out whether the machine 
> was actually dead or still recovering.
> 
> Here is a small patch which issues a log message indicating that the recovery 
> process can take ages.
> Maybe this can prevent some admis from interrupting the recovery process.

Wait, are you saying that the time was spent in the rm_cleanup phase? 
That sounds unbelievable. Surely the time was spent in the redo phase, no?

> In our case, the recovery process took 3.5 days !!

That's a ridiculously long time. Was this a normal recovery, not a PITR 
archive recovery? Any idea why the recovery took so long? Given the max. 
checkpoint timeout of 1h, I would expect that the recovery would take a 
maximum of few hours even with an extremely write-heavy workload.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

pgsql-patches by date

Next:From: Hans-Juergen SchoenigDate: 2008-02-11 09:44:20
Subject: Re: Endless recovery
Previous:From: Hans-Juergen SchoenigDate: 2008-02-11 08:29:39
Subject: Endless recovery

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group