Re: Scaling with memory & disk planning

From: terry(at)greatgulfhomes(dot)com
To: "'Jean-Luc Lachance'" <jllachan(at)nsd(dot)ca>
Cc: <kgunders(at)cbnlottery(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Scaling with memory & disk planning
Date: 2002-05-30 19:31:33
Message-ID: 001e01c20810$9c0b9960$2766f30a@development.greatgulfhomes.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

My simplification was intended, anyway it still equates to the same, because
in a performance machine (lots of memory) reads are (mostly) pulled from
cache (not disk IO). So the real cost is disk writes, and 2 = 2.

Terry Fielder
Network Engineer
Great Gulf Homes / Ashton Woods Homes
terry(at)greatgulfhomes(dot)com

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org
> [mailto:pgsql-general-owner(at)postgresql(dot)org]On Behalf Of Jean-Luc
> Lachance
> Sent: Thursday, May 30, 2002 3:17 PM
> To: terry(at)greatgulfhomes(dot)com
> Cc: kgunders(at)cbnlottery(dot)com; pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] Scaling with memory & disk planning
>
>
> I think your undestanding of RAID 5 is wrong also.
>
> For a general N disk RAID 5 the process is:
> 1)Read sector
> 2)XOR with data to write
> 3)Read parity sector
> 4)XOR with result above
> 5)write data
> 6)write parity
>
> So you can see, for every logical write, there is two reads and two
> writes.
>
> For a 3 disks RAID 5 the process can be shortened:
> 1)Write data
> 2)Read other disk
> 3)XOR with data
> 4)Write to parity disk.
>
> So, two writes and one read.
>
> JLL
>
>
> terry(at)greatgulfhomes(dot)com wrote:
> >
> > Your RAID analysis is a bit wrong.
> > In striping (disk joining) every byte written requires 1
> byte sent to 1
> > disk. This gives you ZERO redundancy: RAID0 is used
> purely for making a
> > large partition from smaller disks.
> >
> > In RAID1 (mirroring) Every 1 byte written requires 1 byte
> written to EACH of
> > the 2 mirrored disks, for total disk IO of 2bytes.
>
> > In RAID5, the most efficient solution, every 1 byte written
> requires LESS
> > then 1 byte written for the CRC. Roughly (depending on
> implementation,
> > number of disks) every 3 bytes written requires 4 bytes of disk IO.
> >
> > RAID5 is the fastest from an algorithm, standpoint. There
> is some gotchas,
> > RAID5 implemented by hardware is faster the RAID5
> implemented by OS, simply
> > because the controller on the SCSI card acts like a
> parallel processor.
> >
> > RAID5 also wastes the least amount of disk space.
> >
> > What is the cheapest is a relative thing, what is certain
> is that RAID 5
> > requires more disks (at least 3) then mirroring (exactly
> 2), but RAID5
> > wastes less space, so the cost analysis begins with a big
> "it depends...".
> >
> > Any disk system will choke under heavy load, especially if
> the disk write
> > system is inefficient (like IBM's IDE interface). I think
> if you did a
> > test, you would find RAID1 would choke more then RAID5
> simply because RAID1
> > requires MORE disk IO for the same bytes being saved.
> >
> > Referring to what Tom Lane said, he recommends 7 drive
> RAID5 for a very good
> > reason: The more the drives, the faster the performance.
> Here's why:
> > Write 7 bytes on a 7 drive RAID5, the first byte goes to
> drive 1, 2nd byte
> > to drive 2, etc, and the CRC to the final drive. For high
> performance SCSI
> > systems, whose BUS IO is faster then drives (and most SCSI
> IO chains ARE
> > faster then the drives they are attached to) the drives
> actually write in
> > PARALLEL. I can give you a more detailed example, but
> suffice to say that
> > with RAID5 writing 7 bytes to 7 data drives takes about the
> same time to
> > write 3 or 4 bytes to a single non raid drive. That my
> friends, is why
> > RAID5 (especially when done by hardware) actually improves
> performance.
> >
> > Terry Fielder
> > Network Engineer
> > Great Gulf Homes / Ashton Woods Homes
> > terry(at)greatgulfhomes(dot)com
> >
> > > -----Original Message-----
> > > From: pgsql-general-owner(at)postgresql(dot)org
> > > [mailto:pgsql-general-owner(at)postgresql(dot)org]On Behalf Of
> Kurt Gunderson
> > > Sent: Thursday, May 30, 2002 12:59 PM
> > > Cc: pgsql-general(at)postgresql(dot)org
> > > Subject: Re: [GENERAL] Scaling with memory & disk planning
> > >
> > >
> > > Bear in mind that I am a newbie to the PostgreSQL world but have
> > > experience in other RDBMSs when I ask this question:
> > >
> > > If you are looking for the best performance, why go with
> a RAID5 as
> > > opposed to a RAID1+0 (mirrored stripes) solution?
> > > Understandably RAID5
> > > is a cheaper solution requiring fewer drives for redundancy
> > > but, from my
> > > experience, RAID5 chokes horribly under heavy disk writing. RAID5
> > > always requires at least two write operations for every block
> > > written;
> > > one to the data and one to the redundancy algorithm.
> > >
> > > Is this wrong?
> > >
> > > (I mean no disrespect)
> > >
> > > Tom Lane wrote:
> > >
> > > > Doug Fields <dfields-pg-general(at)pexicom(dot)com> writes:
> > > >
> > > >>d) How much extra performance does having the log or
> > > indices on a different
> > > >>disk buy you, esp. in the instance where you are inserting
> > > millions of
> > > >>records into a table? An indexed table?
> > > >>
> > > >
> > > > Keeping the logs on a separate drive is a big win, I
> > > believe, for heavy
> > > > update situations. (For read-only queries, of course the
> > > log doesn't
> > > > matter.)
> > > >
> > > > Keeping indexes on a separate drive is also traditional
> > > database advice,
> > > > but I don't have any feeling for how much it matters in
> Postgres.
> > > >
> > > >
> > > >>a) Run everything on one 7-drive RAID 5 partition (8th
> > > drive as hot spare)
> > > >>b) Run logs as a 2-drive mirror and the rest on a 5-drive RAID 5
> > > >>c) Run logs on a 2-drive mirror, indices on a 2-drive
> > > mirror, and the rest
> > > >>on a 3-drive RAID5?
> > > >>d) Run logs & indices on a 2-drive mirror and the rest on a
> > > 5-drive RAID 5
> > > >>
> > > >
> > > > You could probably get away without mirroring the indices,
> > > if you are
> > > > willing to incur a little downtime to rebuild them after an
> > > index drive
> > > > failure. So another possibility is
> > > >
> > > > 2-drive mirror for log, 1 plain old drive for indexes, rest
> > > for data.
> > > >
> > > > If your data will fit on 2 drives then you could mirror
> > > both, still have
> > > > your 8th drive as hot spare, and feel pretty secure.
> > > >
> > > > Note that while it is reasonably painless to configure PG
> > > with WAL logs
> > > > in a special place (after initdb, move the pg_xlog
> > > subdirectory and make
> > > > a symlink to its new location), it's not currently easy
> to separate
> > > > indexes from data. So the most practical approach in the
> > > short term is
> > > > probably your (b).
> > > >
> > > > regards, tom lane
> > > >
> > > > ---------------------------(end of
> > > broadcast)---------------------------
> > > > TIP 6: Have you searched our list archives?
> > > >
> > > > http://archives.postgresql.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Kurt Gunderson
> > > Senior Programmer
> > > Applications Development
> > > Lottery Group
> > > Canadian Bank Note Company, Limited
> > > Email: kgunders(at)cbnlottery(dot)com
> > > Phone:
> > > 613.225.6566 x326
> > > Fax:
> > > 613.225.6651
> > > http://www.cbnco.com/
> > >
> > > "Entropy isn't what is used to be"
> > >
> > > Obtaining any information from this message for the
> purpose of sending
> > > unsolicited commercial Email is strictly prohibited.
> Receiving this
> > > email does not constitute a request of or consent to send
> unsolicited
> > > commercial Email.
> > >
> > >
> > > ---------------------------(end of
> > > broadcast)---------------------------
> > > TIP 6: Have you searched our list archives?
> > >
> > > http://archives.postgresql.org
> > >
> >
> > ---------------------------(end of
> broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to
> majordomo(at)postgresql(dot)org)
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Fran Fabrizio 2002-05-30 19:31:50 Re: horrendous query challenge :-)
Previous Message Jean-Luc Lachance 2002-05-30 19:16:30 Re: Scaling with memory & disk planning