Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org, Michael Clemmons <glassresistor(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Subject: Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)
Date: 2010-01-27 07:21:44
Message-ID: 4B5FE988.3070604@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Greg Stark wrote:
> Actually before we get there could someone who demonstrated the
> speedup verify that this patch still gets that same speedup?
>

Let's step back a second and get to the bottom of why some people are
seeing this and others aren't. The original report here suggested this
was an ext4 issue. As I pointed out recently on the performance list,
the reason for that is likely that the working write-barrier support for
ext4 means it's passing through the fsync to "lying" hard drives via a
proper cache flush, which didn't happen on your typical ext3 install.
Given that, I'd expect I could see the same issue with ext3 given a
drive with its write cache turned off, so that the theory I started
trying to prove before seeing the patch operate.

What I did was create a little test program that created 5 databases and
then dropped them:

\timing
create database a;
create database b;
create database c;
create database d;
create database e;
drop database a;
drop database b;
drop database c;
drop database d;
drop database e;

(All of the drop times were very close by the way; around 100ms, nothing
particularly interesting there)

If I have my system's boot drive (attached to the motherboard, not on
the caching controller) in its regular, lying mode with write cache on,
the creates take the following times:

Time: 713.982 ms Time: 659.890 ms Time: 590.842 ms Time: 675.506 ms
Time: 645.521 ms

A second run gives similar results; seems quite repeatable for every
test I ran so I'll just show one run of each.

If I then turn off the write-cache on the drive:

$ sudo hdparm -W 0 /dev/sdb

And repeat, these times show up instead:

Time: 6781.205 ms Time: 6805.271 ms Time: 6947.037 ms Time: 6938.644
ms Time: 7346.838 ms

So there's the problem case reproduced, right on regular old ext3 and
Ubuntu Jaunty: around 7 seconds to create a database, not real impressive.

Applying the last patch you attached, with the cache on, I see this:

Time: 396.105 ms Time: 389.984 ms Time: 469.800 ms Time: 386.043 ms
Time: 441.269 ms

And if I then turn the write cache off, back to slow times, but much better:

Time: 2162.687 ms Time: 2174.057 ms Time: 2215.785 ms Time: 2174.100
ms Time: 2190.811 ms

That makes the average times I'm seeing on my server:

HEAD Cached: 657 ms Uncached: 6964 ms
Patched Cached: 417 ms Uncached: 2183 ms

Modest speedup even with a caching drive, and a huge speedup in the case
when you have one with slow fsync. Looks to me that if you address
Tom's concern about documentation and function naming, comitting this
patch will certainly deliver as promised on the performance side. Maybe
2 seconds is still too long for some people, but it's at least a whole
lot better.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.co

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2010-01-27 07:46:42 Re: Add on_perl_init and proper destruction to plperl [PATCH]
Previous Message Tom Lane 2010-01-27 06:14:16 Re: Add on_perl_init and proper destruction to plperl [PATCH]

Browse pgsql-performance by date

  From Date Subject
Next Message Thom Brown 2010-01-27 13:28:09 Benchmark shows very slow bulk delete
Previous Message Greg Smith 2010-01-26 22:02:15 Re: Inserting 8MB bytea: just 25% of disk perf used?