Re: Recovery will take 10 hours

From: Jeff Frost <jeff(at)frostconsultingllc(dot)com>
To: Brendan Duddridge <brendan(at)clickspace(dot)com>
Cc: PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Recovery will take 10 hours
Date: 2006-04-20 23:26:57
Message-ID: Pine.LNX.4.64.0604201625570.1527@glacier.frostconsultingllc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Brendan,

Is your NFS share mounted hard or soft? Do you have space to copy the files
locally? I suspect you're seeing NFS slowness in your restore since you
aren't using much in the way of disk IO or CPU.

-Jeff

On Thu, 20 Apr 2006, Brendan Duddridge wrote:

> Oops... forgot to mention that both files that postgres said were missing are
> in fact there:
>
> A partial listing from our wal_archive directory:
>
> -rw------- 1 postgres staff 4971129 Apr 19 20:08 000000010000018F00000036.gz
> -rw------- 1 postgres staff 4378284 Apr 19 20:09 000000010000018F00000037.gz
>
> There didn't seem to be any issues with the NFS mount. Perhaps it briefly
> disconnected and came back right away.
>
>
> Thanks!
>
>
> ____________________________________________________________________
> Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com
>
> ClickSpace Interactive Inc.
> Suite L100, 239 - 10th Ave. SE
> Calgary, AB T2G 0V9
>
> http://www.clickspace.com
>
> On Apr 20, 2006, at 5:11 PM, Brendan Duddridge wrote:
>
>> Hi Jeff,
>>
>> The WAL files are stored on a separate server and accessed through an NFS
>> mount located at /wal_archive.
>>
>> However, the restore failed about 5 hours in after we got this error:
>>
>> [2006-04-20 16:41:28 MDT] LOG: restored log file "000000010000018F00000034"
>> from archive
>> [2006-04-20 16:41:35 MDT] LOG: restored log file "000000010000018F00000035"
>> from archive
>> [2006-04-20 16:41:38 MDT] LOG: restored log file "000000010000018F00000036"
>> from archive
>> sh: line 1: /wal_archive/000000010000018F00000037.gz: No such file or
>> directory
>> [2006-04-20 16:41:46 MDT] LOG: could not open file
>> "pg_xlog/000000010000018F00000037" (log file 399, segment 55): No such file
>> or directory
>> [2006-04-20 16:41:46 MDT] LOG: redo done at 18F/36FFF254
>> sh: line 1: /wal_archive/000000010000018F00000036.gz: No such file or
>> directory
>> [2006-04-20 16:41:46 MDT] PANIC: could not open file
>> "pg_xlog/000000010000018F00000036" (log file 399, segment 54): No such file
>> or directory
>> [2006-04-20 16:41:46 MDT] LOG: startup process (PID 9190) was terminated by
>> signal 6
>> [2006-04-20 16:41:46 MDT] LOG: aborting startup due to startup process
>> failure
>> [2006-04-20 16:41:46 MDT] LOG: logger shutting down
>>
>>
>>
>> The /wal_archive/000000010000018F00000037.gz is there accessible on the NFS
>> mount.
>>
>> Is there a way to continue the restore process from where it left off?
>>
>> Thanks,
>>
>> ____________________________________________________________________
>> Brendan Duddridge | CTO | 403-277-5591 x24 | brendan(at)clickspace(dot)com
>>
>> ClickSpace Interactive Inc.
>> Suite L100, 239 - 10th Ave. SE
>> Calgary, AB T2G 0V9
>>
>> http://www.clickspace.com
>>
>> On Apr 20, 2006, at 3:19 PM, Jeff Frost wrote:
>>
>>> On Thu, 20 Apr 2006, Brendan Duddridge wrote:
>>>
>>>> Hi,
>>>>
>>>> We had a database issue today that caused us to have to restore to our
>>>> most recent backup. We are using PITR so we have 3120 WAL files that need
>>>> to be applied to the database.
>>>>
>>>> After 45 minutes, it has restored only 230 WAL files. At this rate, it's
>>>> going to take about 10 hours to restore our database.
>>>>
>>>> Most of the time, the server is not using very much CPU time or I/O time.
>>>> So I'm wondering what can be done to speed up the process?
>>>
>>> Brendan,
>>>
>>> Where are the WAL files being stored and how are they being read back?
>>>
>>> --
>>> Jeff Frost, Owner <jeff(at)frostconsultingllc(dot)com>
>>> Frost Consulting, LLC http://www.frostconsultingllc.com/
>>> Phone: 650-780-7908 FAX: 650-649-1954
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>>> message can get through to the mailing list cleanly
>>>
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/docs/faq
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Jeff Frost, Owner <jeff(at)frostconsultingllc(dot)com>
Frost Consulting, LLC http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Brendan Duddridge 2006-04-20 23:27:53 Re: Recovery will take 10 hours
Previous Message Tom Lane 2006-04-20 23:20:47 Re: Recovery will take 10 hours