Re: What exactly is postgres doing during INSERT/UPDATE ?

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Luke Koops <luke(dot)koops(at)entrust(dot)com>, Joseph S <jks(at)selectacast(dot)net>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: What exactly is postgres doing during INSERT/UPDATE ?
Date: 2009-08-31 14:38:25
Message-ID: b42b73150908310738k4d8deb8fu302658215f9705d4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Aug 30, 2009 at 7:38 PM, Greg Stark<gsstark(at)mit(dot)edu> wrote:
> On Sun, Aug 30, 2009 at 11:56 PM, Merlin Moncure<mmoncure(at)gmail(dot)com> wrote:
>> 192k written
>>  raid 10: six writes
>>  raid 5: four writes, one read (but the read and one of the writes is
>> same physical location)
>>
>> now, by 'same physical' location, that may mean that the drive head
>> has to move if the data is not in cache.
>>
>> I realize that many raid 5 implementations tend to suck.  That said,
>> raid 5 should offer higher theoretical performance for writing than
>> raid 10, both for sequential and random.
>
> In the above there are two problems.
>
> 1) 192kB is not a random access pattern. Any time you're writing a
> whole raid stripe or more then RAID5 can start performing reasonably
> but that's not random, that's sequential i/o. The relevant random i/o
> pattern is writing 8kB chunks at random offsets into a multi-terabyte
> storage which doesn't fit in cache.
>
> 2) It's not clear but I think you're saying "but the read and one of
> the writes is same physical location" on the basis that this mitigates
> the costs. In fact it's the worst case. It means after doing the read
> and calculating the parity block the drive must then spin a full
> rotation before being able to write it back out. So instead of an
> average latency of 1/2 of a rotation you have that plus a full
> rotation, or 3x as much latency before the write can be performed as
> without raid5.
>
> It's not a fault of the implementations, it's a fundamental problem
> with RAId5. Even a spectacular implementation of RAID5 will be awful
> for random access writes. The only saving grace some hardware
> implementations have is having huge amounts of battery backed cache
> which mean that they can usually buffer all the writes for long enough
> that the access patterns no longer look random. If you buffer enough
> then you can hope you'll eventually overwrite the whole stripe and can
> write out the new parity without reading the old data. Or failing that
> you can perform the reads of the old data when it's convenient because
> you're reading nearby data effectively turning it into sequential i/o.

I agree, that's good analysis. The main point I was making was that
if you have say a 10 disk raid 5, you don't involve 10 disks, only
two...a very common misconception. I made another mistake that you
didn't catch: you need to read *both* the data drive and the parity
drive before writing, not just the parity drive.

I wonder if flash SSD are a better fit for raid 5 since the reads are
much cheaper than writes and there is no rotational latency. (also,
$/gb is different, and so are the failure cases).

merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Aidan Van Dyk 2009-08-31 14:48:09 Re: What exactly is postgres doing during INSERT/UPDATE ?
Previous Message Greg Stark 2009-08-30 23:38:33 Re: What exactly is postgres doing during INSERT/UPDATE ?