Re: Slow PITR restore

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Jeff Trout" <threshar(at)threshar(dot)is-a-geek(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Slow PITR restore
Date: 2007-12-13 06:27:44
Message-ID: 874pen2jof.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>
>> Exactly. Which is the point I am making. Five minutes of transactions
>> is nothing (speaking generally).. In short, if we are in recovery, and
>> we are not saturated the I/O and at least a single CPU, there is a huge
>> amount of optimization *somewhere* to be done.
>
> You sure about that? I tested CVS HEAD just now, by setting the
> checkpoint_ parameters really high, running pgbench for awhile, and
> then killing the bgwriter to force a recovery cycle over all the WAL
> generated by the pgbench run. What I saw was that the machine was 100%
> disk write bound. Increasing shared_buffers helped, not in that the
> write rate got less according to vmstat, but the completion time did.

There are at least three definitions of "saturating the I/O" and it sounds
like you two are using two different ones.

1) The processor is waiting on I/O all the time
2) The hard drives are all always handling a request
3) The hard drives are all handling the full bandwidth they're capable

You would expect (1) and (2) to be the same for a single drive -- though in
practice there seems to be a gap even between them. But for a raid array there
can be a large difference, and the wider the raid stripe the larger the
difference.

In Tom's results:

> procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
> r b swpd free buff cache si so bi bo in cs us sy id wa st
> 0 9 70024 29232 19876 824368 0 0 0 3152 1447 233 0 1 0 99 0
> 0 9 70024 29232 19876 824368 0 0 0 3660 1474 252 0 1 0 99 0
> 0 8 70024 28960 19876 824404 0 0 0 3176 1448 265 0 2 1 97 0
>
> I don't see the machine sitting around doing nothing ...

Note that even though the processor is 99% in wait state the drive is only
handling about 3 MB/s. That translates into a seek time of 2.2ms which is
actually pretty fast. So if this is a single drive (1) and (2) seem to be
pretty much the same here.

But note that if this were a raid array Postgres's wouldn't be getting any
better results. A Raid array wouldn't improve i/o latency at all and since
it's already 99% waiting for i/o Postgres is not going to be able to issue any
more. But only one drive in the raid array will be busy at a time which would
be far less than the maximum random access i/o the raid array is capable of.

Heikki proposed a while back to use posix_fadvise() when processing logs to
read-ahead blocks which the recover will need before actually attempting to
recover them. On a raid array that would bring the 3MB/s above up to the
maximum number of random accesses the raid array can handle (ie, definition
(2) above).

That's still going to be a far cry from the maximum bandwidth the hard drive
can handle. Even that single drive can probably handle 60MB/s sequential I/O.
That's probably the source of the unrealistic expectations people have. It's
easy to watch the bandwidth number as the headline number to measure i/o
utilization.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gerald Timothy Quimpo 2007-12-13 06:36:55 Re: what is the date format in binary query results
Previous Message Tom Lane 2007-12-13 06:25:08 Re: Possible bug in PostgreSQL 8.3beta4 (postgres process segfaults)

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-12-13 06:36:33 Re: little correction
Previous Message Joshua D. Drake 2007-12-13 06:04:57 Re: Slow PITR restore