Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++

From: david(at)lang(dot)hm
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Ogden <lists(at)darkstatic(dot)com>, pgsql-performance(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject: Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++
Date: 2011-09-13 00:47:59
Message-ID: alpine.DEB.2.02.1109121741440.522@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 12 Sep 2011, Aidan Van Dyk wrote:

> On Mon, Sep 12, 2011 at 6:57 PM, <david(at)lang(dot)hm> wrote:
>
>>> The "barrier" is the linux fs/block way of saying "these writes need
>>> to be on persistent media before I can depend on them".  On typical
>>> spinning media disks, that means out of the disk cache (which is not
>>> persistent) and on platters.  The way it assures that the writes are
>>> on "persistant media" is with a "flush cache" type of command.  The
>>> "flush cache" is a close approximation to "make sure it's persistent".
>>>
>>> If your cache is battery backed, it is now persistent, and there is no
>>> need to "flush cache", hence the nobarrier option if you believe your
>>> cache is persistent.
>>>
>>> Now, make sure that even though your raid cache is persistent, your
>>> disks have cache in write-through mode, cause it would suck for your
>>> raid cache to "work", but believe the data is safely on disk and only
>>> find out that it was in the disks (small) cache, and you're raid is
>>> out of sync after an outage because of that...  I believe most raid
>>> cards will handle that correctly for you automatically.
>>
>> if you don't have barriers enabled, the data may not get written out of main
>> memory to the battery backed memory on the card as the OS has no reason to
>> do the write out of the OS buffers now rather than later.
>
> It's not quite so simple. The "sync" calls (pick your flavour) is
> what tells the OS buffers they have to go out. The syscall (on a
> working FS) won't return until the write and data has reached the
> "device" safely, and is considered persistent.
>
> But in linux, a barrier is actually a "synchronization" point, not
> just a "flush cache"... It's a "guarantee everything up to now is
> persistent, I'm going to start counting on it". But depending on your
> card, drivers and yes, kernel version, that "barrier" is sometimes a
> "drain/block I/O queue, issue cache flush, wait, write specific data,
> flush, wait, open I/O queue". The double flush is because it needs to
> guarantee everything previous is good before it writes the "critical"
> piece, and then needs to guarantee that too.
>
> Now, on good raid hardware it's not usually that bad.
>
> And then, just to confuse people more, LVM up until 2.6.29 (so that
> includes all those RHEL5/CentOS5 installs out there which default to
> using LVM) didn't handle barriers, it just sort of threw them out as
> it came across them, meaning that you got the performance of
> nobarrier, even if you thought you were using barriers on poor raid
> hardware.

this is part of the problem.

if you have a simple fs-on-hardware you may be able to get away with the
barriers, but if you have fs-on-x-on-y-on-hardware type of thing
(specifically where LVM is one of the things in the middle), and those
things in the middle do not honor barriers, the fsync becomes meaningless
because without propogating the barrier down the stack, the writes that
the fsync triggers may not get to the disk.

>> Every raid card I have seen has ignored the 'flush cache' type of command if
>> it has a battery and that battery is good, so you leave the barriers enabled
>> and the card still gives you great performance.
>
> XFS FAQ goes over much of it, starting at Q24:
> http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_problem_with_the_write_cache_on_journaled_filesystems.3F
>
> So, for pure performance, on a battery-backed controller, nobarrier is
> the recommended *performance* setting.
>
> But, to throw a wrench into the plan, what happens when during normal
> battery tests, your raid controller decides the battery is failing...
> of course, it's going to start screaming and send all your monitoring
> alarms off (you're monitoring that, right?), but have you thought to
> make sure that your FS is remounted with barriers at the first sign of
> battery trouble?

yep.

on a good raid card with battery backed cache, the performance difference
between barriers being on and barriers being off should be minimal. If
it's not, I think that you have something else going on.

David Lang
>From pgsql-performance-owner(at)postgresql(dot)org Mon Sep 12 22:08:55 2011
Received: from maia.hub.org (maia-2.hub.org [200.46.204.251])
by mail.postgresql.org (Postfix) with ESMTP id 6DDC4B5DC3B
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Mon, 12 Sep 2011 22:08:55 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by maia.hub.org (mx1.hub.org [200.46.204.251]) (amavisd-maia, port 10024)
with ESMTP id 49727-04
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Tue, 13 Sep 2011 01:08:48 +0000 (UTC)
X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6
Received: from mail-yx0-f174.google.com (mail-yx0-f174.google.com [209.85.213.174])
by mail.postgresql.org (Postfix) with ESMTP id 7F5DEB5DC22
for <pgsql-performance(at)postgresql(dot)org>; Mon, 12 Sep 2011 22:08:48 -0300 (ADT)
Received: by yxm8 with SMTP id 8so28805yxm.19
for <pgsql-performance(at)postgresql(dot)org>; Mon, 12 Sep 2011 18:08:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=mime-version:sender:in-reply-to:references:date
:x-google-sender-auth:message-id:subject:from:to:cc:content-type;
bh=iZ3QCl+nR615GJJBcw9lzuxmVR2JBp1SiT/nsejm8S8=;
b=ZQo6kFRjKTJeJqow8VVT8w5tZnSrgG6VsFzDn6Hb56VPukOZPPYL7QNw56K2Z0P+Gy
xx6e5CmVWbjRyWeUcFuCYhxK0uViQ1JCEjoU9WGA6x8OtCJQLbhl4ORrvg3ZpIWmAwmG
hWq1A9Mq4Ok/ANUWDK0EluofMg1RSBBfwZ3Z8=
MIME-Version: 1.0
Received: by 10.68.6.201 with SMTP id d9mr1052618pba.19.1315876128062; Mon, 12
Sep 2011 18:08:48 -0700 (PDT)
Received: by 10.68.54.4 with HTTP; Mon, 12 Sep 2011 18:08:48 -0700 (PDT)
In-Reply-To: <201109111912(dot)17064(dot)ahodgson(at)simkin(dot)ca>
References: <CAO2AxyoGvmRYtq=1=weOU_CCnAxK8FboBa8GWS6XZ923CXCCgA(at)mail(dot)gmail(dot)com>
<201109111912(dot)17064(dot)ahodgson(at)simkin(dot)ca>
Date: Mon, 12 Sep 2011 20:08:48 -0500
X-Google-Sender-Auth: GjKYg0tbGJg7MsLUHlQnEqCQoIk
Message-ID: <CAO2Axyp-F5FF1ysjk9OGuxpTVaQUWDqyG1kGuZ0L7LUh3HGXzg(at)mail(dot)gmail(dot)com>
Subject: Re: RAID Controller (HP P400) beat by SW-RAID?
From: Anthony Presley <anthony(at)resolution(dot)com>
To: Alan Hodgson <ahodgson(at)simkin(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Content-Type: multipart/alternative; boundary=bcaec53961362e7cb404acc84964
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=-1.898 tagged_above=-10 required=5
tests=BAYES_00=-1.9, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001
X-Spam-Level:
X-Archive-Number: 201109/174
X-Sequence-Number: 44950

--bcaec53961362e7cb404acc84964
Content-Type: text/plain; charset=ISO-8859-1

So, today, I did the following:

- Swapped out the 5410's (2.3Ghz) for 5470's (3.33Ghz)
- Set the ext4 mount options to be noatime,barrier=0,data=writeback
- Installed PG 9.1 from the yum repo

Item one:
With the accelerator cache set to 0/100 (all 512MB for writing), loading
the db / creating the indexes was about 8 minutes faster. Was hoping for
more, but didn't get it. If I split the CREATE INDEXes into separate psql
instances, will that be done in parallel?

Item two:
I'm still getting VERY strange results in my SELECT queries.

For example, on the new server:
http://explain.depesz.com/s/qji - This takes 307ms, all the time. Doesn't
matter if it's "cached", or fresh from a reboot.

Same query on the live / old server:
http://explain.depesz.com/s/8Pd - This can take 2-3s the first time, but
then takes 42ms once it's cached.

Both of these servers have the same indexes, and almost identical data.
However, the old server is doing some different planning than the new
server.

What did I switch (or should I unswitch)?

--
Anthony

On Sun, Sep 11, 2011 at 9:12 PM, Alan Hodgson <ahodgson(at)simkin(dot)ca> wrote:

> On September 11, 2011 03:44:34 PM Anthony Presley wrote:
> > First thing I noticed is that it takes the same amount of time to load
> the
> > db (about 40 minutes) on the new hardware as the old hardware. I was
> > really hoping with the faster, additional drives and a hardware RAID
> > controller, that this would be faster. The database is only about 9GB
> > with pg_dump (about 28GB with indexes).
>
> Loading the DB is going to be CPU-bound (on a single) core, unless your
> disks
> really suck, which they don't. Most of the time will be spent building
> indexes.
>
> I don't know offhand why the queries are slower, though, unless you're not
> getting as much cached before testing as on the older box.
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

--bcaec53961362e7cb404acc84964
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

So, today, I did the following:<div><br></div><div>=A0=A0- Swapped out the =
5410&#39;s (2.3Ghz) for 5470&#39;s (3.33Ghz)</div><div>=A0=A0- Set the ext4=
mount options to be=A0noatime,barrier=3D0,data=3Dwriteback</div><div>=A0=
=A0- Installed PG 9.1 from the yum repo</div>
<div><br></div><div>Item one:</div><div>=A0=A0With the accelerator cache se=
t to 0/100 (all 512MB for writing), loading the db / creating the indexes w=
as about 8 minutes faster. =A0Was hoping for more, but didn&#39;t get it. =
=A0If I split the CREATE INDEXes into separate psql instances, will that be=
done in parallel?</div>
<div><br></div><div>Item two:</div><div>=A0=A0I&#39;m still getting VERY st=
range results in my SELECT queries. =A0</div><div><br></div><div>For exampl=
e, on the new server:</div><div>=A0=A0<a href=3D"http://explain.depesz.com/=
s/qji">http://explain.depesz.com/s/qji</a>=A0- This takes 307ms, all the ti=
me. =A0Doesn&#39;t matter if it&#39;s &quot;cached&quot;, or fresh from a r=
eboot.</div>
<div><br></div><div>Same query on the live / old server:</div><div>=A0=A0<a=
href=3D"http://explain.depesz.com/s/8Pd">http://explain.depesz.com/s/8Pd</=
a>=A0- This can take 2-3s the first time, but then takes 42ms once it&#39;s=
cached.</div>
<div><br></div><div>Both of these servers have the same indexes, and almost=
identical data. =A0However, the old server is doing some different plannin=
g than the new server.</div><div><br></div><div>What did I switch (or shoul=
d I unswitch)?</div>
<div><br></div><div><br></div><div>--</div><div>Anthony</div><div><br><div =
class=3D"gmail_quote">On Sun, Sep 11, 2011 at 9:12 PM, Alan Hodgson <span d=
ir=3D"ltr">&lt;<a href=3D"mailto:ahodgson(at)simkin(dot)ca">ahodgson(at)simkin(dot)ca</a>=
&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div class=3D"im">On September 11, 2011 03:=
44:34 PM Anthony Presley wrote:<br>
&gt; First thing I noticed is that it takes the same amount of time to load=
the<br>
&gt; db (about 40 minutes) on the new hardware as the old hardware. =A0I wa=
s<br>
&gt; really hoping with the faster, additional drives and a hardware RAID<b=
r>
&gt; controller, that this would be faster. =A0The database is only about 9=
GB<br>
&gt; with pg_dump (about 28GB with indexes).<br>
<br>
</div>Loading the DB is going to be CPU-bound (on a single) core, unless yo=
ur disks<br>
really suck, which they don&#39;t. Most of the time will be spent building<=
br>
indexes.<br>
<br>
I don&#39;t know offhand why the queries are slower, though, unless you&#39=
;re not<br>
getting as much cached before testing as on the older box.<br>
<font color=3D"#888888"><br>
--<br>
Sent via pgsql-performance mailing list (<a href=3D"mailto:pgsql-performanc=
e(at)postgresql(dot)org">pgsql-performance(at)postgresql(dot)org</a>)<br>
To make changes to your subscription:<br>
<a href=3D"http://www.postgresql.org/mailpref/pgsql-performance" target=3D"=
_blank">http://www.postgresql.org/mailpref/pgsql-performance</a></font></bl=
ockquote></div>
</div>

--bcaec53961362e7cb404acc84964--

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Stephen Frost 2011-09-13 01:13:27 Re: Allow sorts to use more available memory
Previous Message Aidan Van Dyk 2011-09-13 00:05:58 Re: Raid 5 vs Raid 10 Benchmarks Using bonnie++