Avoiding unnecessary reads in recovery

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Avoiding unnecessary reads in recovery
Date: 2007-04-25 12:48:51
Message-ID: 462F4E33.50904@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In recovery, with full_pages_writes=on, we read in each page only to
overwrite the contents with a full page image. That's a waste of time,
and can have a surprisingly large effect on recovery time.

As a quick test on my laptop, I initialized a DBT-2 test with 5
warehouses, and let it run for 2 minutes without think-times to generate
some WAL. Then I did a "kill -9 postmaster", and took a copy of the data
directory to use for testing recovery.

With CVS HEAD, the recovery took ~ 2 minutes. With the attached patch,
it took 5 seconds. (yes, I used the same not-yet-recovered data
directory in both tests, and cleared the os cache with "echo 1 >
/proc/sys/vm/drop_caches").

I was surprised how big a difference it makes, but when you think about
it it's logical. Without the patch, it's doing roughly the same I/O as
the test itself, reading in pages, modifying them, and writing them
back. With the patch, all the reads are done sequentially from the WAL,
and then written back in a batch at the end of the WAL replay which is a
lot more efficient.

It's interesting that (with the patch) full_page_writes can *shorten*
your recovery time. I've always thought it to have a purely negative
effect on performance.

I'll leave it up to the jury if this tiny little change is appropriate
after feature freeze...

While working on this, this comment in ReadBuffer caught my eye:

> /*
> * During WAL recovery, the first access to any data page should
> * overwrite the whole page from the WAL; so a clobbered page
> * header is not reason to fail. Hence, when InRecovery we may
> * always act as though zero_damaged_pages is ON.
> */
> if (zero_damaged_pages || InRecovery)
> {

But that assumption only holds if full_page_writes is enabled, right? I
changed that in the attached patch as well, but if it isn't accepted
that part of it should still be applied, I think.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
skip-unnecessary-reads-on-recovery.patch text/x-diff 5.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kenneth Marshall 2007-04-25 12:56:49 Re: [HACKERS] Full page writes improvement, code update
Previous Message Dave Page 2007-04-25 12:31:18 Re: Buildfarm: Stage logs not available for MSVC builds