Re: Slow PITR restore

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Trout <threshar(at)threshar(dot)is-a-geek(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Slow PITR restore
Date: 2007-12-13 06:04:57
Message-ID: 4760CB89.4080408@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>> Jeff Trout <threshar(at)threshar(dot)is-a-geek(dot)com> wrote:
>>> in this case it was 24hrs of data - about 1500 wal segments. During
>>> this time the machine was nearly complete idle and there wasn't very
>>> much IO going on (few megs/sec).
>
>> Exactly. Which is the point I am making. Five minutes of transactions
>> is nothing (speaking generally).. In short, if we are in recovery, and
>> we are not saturated the I/O and at least a single CPU, there is a huge
>> amount of optimization *somewhere* to be done.
>
> You sure about that? I tested CVS HEAD just now, by setting the
> checkpoint_ parameters really high,

Well I haven't tested CVS HEAD yet, but on 8.2:

> avg-cpu: %user %nice %sys %iowait %idle
> 0.12 0.00 0.30 12.22 87.35
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 172.26 2927.74 46.31 14668 232
> sda1 182.63 2927.74 46.31 14668 232
> sdb 1.40 0.00 41.52 0 208
> sdb1 10.38 0.00 41.52 0 208
> sdb2 0.00 0.00 0.00 0 0
> sdc 96.21 946.11 14.37 4740 72
> sdc1 99.20 946.11 14.37 4740 72
> sdd 0.00 0.00 0.00 0 0
> sdd1 0.00 0.00 0.00 0 0

And:

> 2007-12-13 00:55:20 EST LOG: restored log file "00000001000007E10000006B" from archive
> 2007-12-13 00:55:34 EST LOG: restored log file "00000001000007E10000006C" from archive
> 2007-12-13 00:55:48 EST LOG: restored log file "00000001000007E10000006D" from archive
> 2007-12-13 00:56:01 EST LOG: restored log file "00000001000007E10000006E" from archive
> 2007-12-13 00:56:14 EST LOG: restored log file "00000001000007E10000006F" from archive
> 2007-12-13 00:56:28 EST LOG: restored log file "00000001000007E100000070" from archive
> 2007-12-13 00:56:42 EST LOG: restored log file "00000001000007E100000071" from archive

This is a 14 Spindles in RAID 10 on sda and sdc. As a correlation, exact
same hardware (and OS) and off peak production load of 2.25 million
xacts per hour (at data point), 3million per hour at peak.:

> avg-cpu: %user %nice %sys %iowait %idle
> 23.30 0.00 15.28 33.07 28.35
>
> Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
> sda 2078.80 23693.60 10628.00 118468 53140
> sda1 4139.80 23693.60 10628.00 118468 53140
> sdb 24.00 94.40 318.40 472 1592
> sdb1 87.40 94.40 318.40 472 1592
> sdc 694.00 22207.20 4113.60 111036 20568
> sdc1 1575.00 22207.20 4113.60 111036 20568
> sdd 826.80 3.20 8691.20 16 43456
> sdd1 828.60 3.20 8691.20 16 43456

I would expect that recovery would be faster than prod. Maybe I am off
my rocker but the warm standby isn't doing anything but restoring the logs.

This is going to be hard to test with 8.3 with this data set but I will
see if I can get some other numbers with 8.3.

Sincerely,

Joshua D. Drake

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2007-12-13 06:25:08 Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)
Previous Message Tatsuo Ishii 2007-12-13 05:58:18 Re: timestamp with time zone

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-12-13 06:27:44 Re: Slow PITR restore
Previous Message Greg Smith 2007-12-13 02:10:51 Re: [HACKERS] "distributed checkpoint"