Re: Raid 10 chunksize

From: david(at)lang(dot)hm
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Stef Telford <stef(at)ummon(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Raid 10 chunksize
Date: 2009-04-01 22:59:18
Message-ID: alpine.DEB.1.10.0904011555510.28893@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 1 Apr 2009, Scott Carey wrote:

> On 4/1/09 9:54 AM, "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> wrote:
>
>> On Wed, Apr 1, 2009 at 10:48 AM, Stef Telford <stef(at)ummon(dot)com> wrote:
>>> Scott Marlowe wrote:
>>>> On Wed, Apr 1, 2009 at 10:15 AM, Stef Telford <stef(at)ummon(dot)com> wrote:
>>>>
>>>>>     I do agree that the benefit is probably from write-caching, but I
>>>>> think that this is a 'win' as long as you have a UPS or BBU adaptor,
>>>>> and really, in a prod environment, not having a UPS is .. well. Crazy ?
>>>>>
>>>>
>>>> You do know that UPSes can fail, right?  En masse sometimes even.
>>>>
>>> Hello Scott,
>>>    Well, the only time the UPS has failed in my memory, was during the
>>> great Eastern Seaboard power outage of 2003. Lots of fond memories
>>> running around Toronto with a gas can looking for oil for generator
>>> power. This said though, anything could happen, the co-lo could be taken
>>> out by a meteor and then sync on or off makes no difference.
>>
>> Meteor strike is far less likely than a power surge taking out a UPS.
>> I saw a whole data center go black when a power conditioner blew out,
>> taking out the other three power conditioners, both industrial UPSes
>> and the switch for the diesel generator. And I have friends who have
>> seen the same type of thing before as well. The data is the most
>> expensive part of any server.
>>
> Yeah, well I?ve had a RAID card die, which broke its Battery backed cache.
> They?re all unsafe, technically.
>
> In fact, not only are battery backed caches unsafe, but hard drives. They
> can return bad data. So if you want to be really safe:
>
> 1: don't use Linux -- you have to use something with full data and metadata
> checksums like ZFS or very expensive proprietary file systems.

this will involve other tradeoffs

> 2: combine it with mirrored SSD's that don't use write cache (so you can
> have fsync perf about as good as a battery backed raid card without that
> risk).

they _all_ have write caches. a beast like you are looking for doesn't
exist

> 4: keep a live redundant system with a PITR backup at another site that can
> recover in a short period of time.

a good option to keep in mind (and when the new replication code becomes
available, that will be even better)

> 3: Run in a datacenter well underground with a plutonium nuclear power
> supply. Meteor strikes and Nuclear holocaust, beware!

at some point all that will fail

but you missed point #5 (in many ways a more important point than the
others that you describe)

switch from using postgres to using a database that can do two-phase
commits across redundant machines so that you know the data is safe on
multiple systems before the command is considered complete.

David Lang
>From pgsql-performance-owner(at)postgresql(dot)org Wed Apr 1 20:39:34 2009
Received: from localhost (unknown [200.46.208.211])
by mail.postgresql.org (Postfix) with ESMTP id 9AE2E634E2A
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>; Wed, 1 Apr 2009 20:39:33 -0300 (ADT)
Received: from mail.postgresql.org ([200.46.204.86])
by localhost (mx1.hub.org [200.46.208.211]) (amavisd-maia, port 10024)
with ESMTP id 56626-06
for <pgsql-performance-postgresql(dot)org(at)mail(dot)postgresql(dot)org>;
Wed, 1 Apr 2009 20:39:24 -0300 (ADT)
X-Greylist: from auto-whitelisted by SQLgrey-1.7.6
Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.27])
by mail.postgresql.org (Postfix) with ESMTP id 212E663225E
for <pgsql-performance(at)postgresql(dot)org>; Wed, 1 Apr 2009 20:39:30 -0300 (ADT)
Received: by ey-out-2122.google.com with SMTP id 22so54783eye.61
for <pgsql-performance(at)postgresql(dot)org>; Wed, 01 Apr 2009 16:39:29 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:mime-version:received:in-reply-to:references
:date:message-id:subject:from:to:cc:content-type
:content-transfer-encoding;
bh=U09x0xo2zvgG+iDz8C+m8amviI4SidSayHYXSMGJs+8=;
b=vFJetRtNcbe7UQmSfI1nE1JwmXNF1WBumBzbYtZeEKJ6dB2s+oULxClY1rQWsGwdP+
3RtK/aen2uaaP9WjDlUgCuZOwVOPvSywgweHSLMJwWghkp3dSMHvj1++jVFYIW9nJFOe
s6/PTDNVzTDlexXGlZ83LPUVvDkJE5pKdACnA=
DomainKey-Signature: a=rsa-sha1; c=nofws;
d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:cc:content-type:content-transfer-encoding;
b=Zlc6sRvh3plEjvB7pKtZ3tqkfWZudNFO2bWjxZj5plKpJy2K66cZv0JvGSf2FFunHe
EpWw8Ql/OlVSb6i6bn47RKwjFNCLHiRv0aXSKssXVaP6Wzktx0nbFqw+izGEsW2K9/Rb
6WRDwxjhTlMUR/JJjzYLqHoQTzWnsNzwg6HRA=
MIME-Version: 1.0
Received: by 10.210.81.10 with SMTP id e10mr3428573ebb.89.1238629169192; Wed,
01 Apr 2009 16:39:29 -0700 (PDT)
In-Reply-To: <C5F93598(dot)40BF%scott(at)richrelevance(dot)com>
References: <dcc563d10904010954u582600a9t721fd1a2050a802f(at)mail(dot)gmail(dot)com>
<C5F93598(dot)40BF%scott(at)richrelevance(dot)com>
Date: Wed, 1 Apr 2009 17:39:29 -0600
Message-ID: <dcc563d10904011639g58f2ee2cx64d49e98665ac81b(at)mail(dot)gmail(dot)com>
Subject: Re: Raid 10 chunksize
From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Stef Telford <stef(at)ummon(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>,
Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>,
"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Virus-Scanned: Maia Mailguard 1.0.1
X-Spam-Status: No, hits=0.013 tagged_above=0 required=5 tests=AWL=0.013
X-Spam-Level:
X-Archive-Number: 200904/37
X-Sequence-Number: 33404

On Wed, Apr 1, 2009 at 4:15 PM, Scott Carey <scott(at)richrelevance(dot)com> wrote=
:
>
> On 4/1/09 9:54 AM, "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> wrote:
>
>> On Wed, Apr 1, 2009 at 10:48 AM, Stef Telford <stef(at)ummon(dot)com> wrote:
>>> Scott Marlowe wrote:
>>>> On Wed, Apr 1, 2009 at 10:15 AM, Stef Telford <stef(at)ummon(dot)com> wrote:
>>>>
>>>>> =A0 =A0 I do agree that the benefit is probably from write-caching, b=
ut I
>>>>> think that this is a 'win' as long as you have a UPS or BBU adaptor,
>>>>> and really, in a prod environment, not having a UPS is .. well. Crazy=
?
>>>>>
>>>>
>>>> You do know that UPSes can fail, right? =A0En masse sometimes even.
>>>>
>>> Hello Scott,
>>> =A0 =A0Well, the only time the UPS has failed in my memory, was during =
the
>>> great Eastern Seaboard power outage of 2003. Lots of fond memories
>>> running around Toronto with a gas can looking for oil for generator
>>> power. This said though, anything could happen, the co-lo could be take=
n
>>> out by a meteor and then sync on or off makes no difference.
>>
>> Meteor strike is far less likely than a power surge taking out a UPS.
>> I saw a whole data center go black when a power conditioner blew out,
>> taking out the other three power conditioners, both industrial UPSes
>> and the switch for the diesel generator. =A0And I have friends who have
>> seen the same type of thing before as well. =A0The data is the most
>> expensive part of any server.
>>
> Yeah, well I=B9ve had a RAID card die, which broke its Battery backed cac=
he.
> They=B9re all unsafe, technically.

That's why you use two controllers with mirror sets across them and
them RAID-0 across the top. But I know what you mean. Now the mobo
and memory are the single point of failure. Next stop, sequent etc.

> In fact, not only are battery backed caches unsafe, but hard drives. =A0T=
hey
> can return bad data. =A0So if you want to be really safe:
>
> 1: don't use Linux -- you have to use something with full data and metada=
ta
> checksums like ZFS or very expensive proprietary file systems.

You'd better be running them on sequent or Sysplex mainframe type hardware.

> 4: keep a live redundant system with a PITR backup at another site that c=
an
> recover in a short period of time.
> 3: Run in a datacenter well underground with a plutonium nuclear power
> supply. =A0Meteor strikes and Nuclear holocaust, beware!

Pleaze, such hyperbole! Everyone know it can run on uranium just as
well. I'm sure these guys:
http://royal.pingdom.com/2008/11/14/the-worlds-most-super-designed-data-cen=
ter-fit-for-a-james-bond-villain/
can sort that out for you.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2009-04-02 06:19:17 Re: Raid 10 chunksize
Previous Message Scott Carey 2009-04-01 22:35:57 Re: Raid 10 chunksize