Skip site navigation (1) Skip section navigation (2)

Re: Tables on multiple disk drives

From: Craig Thomas <craiger(at)osdl(dot)org>
To: <scott(dot)marlowe(at)ihs(dot)com>
Cc: <craiger(at)osdl(dot)org>, <lists2(at)tokar(dot)ru>,<pgsql-performance(at)postgresql(dot)org>
Subject: Re: Tables on multiple disk drives
Date: 2004-02-17 18:16:22
Message-ID: 64865.4.5.9.170.1077041782.squirrel@www.osdl.org (view raw or flat)
Thread:
Lists: pgsql-performance
> On Tue, 17 Feb 2004, Craig Thomas wrote:
>
>> > On Tue, 17 Feb 2004, Konstantin Tokar wrote:
>> >
>> >> Hi!
>> >> Does PostgreSQL allow to create tables and indices of a single
>> database on multiple disk drives with a purpose of increase
>> >> performance as Oracle database does? If a symbolic reference is the
>> only method then the next question is: how can it be determined
>> what file is referred to what table and index?
>> >
>> > You're life will be simpler, and your setup will be faster without
>> having  to muck about with it, if you just buy a good RAID
>> controller with battery  backed cache.  LSI/Megaraid and Adaptec
>> both make serviceable controllers  for reasonable prices, and as you
>> add drives, the speed just goes up, no  muddling around with sym
>> links.
>>
>> This works to a limited extent.  For very large databases, maximum
>> throughput of I/O is the paramount factor for database performance.
>> With raid controllers, your LUN is still limited to a small number of
>> disks. PostgreSQL can only write on a file system, but Oracle, SAP DB,
>> DB2, etc can write directly to disk (raw I/O).  With large databases
>> it is advantageous to spread a table across 100's of disks, if the
>> table is quite large.  I don't know of any manufacturer that creates a
>> 100 disk raid array yet.
>
> You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI
> interfaces, and they act as one unit.  That's 3*4*15 = 180 disks max.
>
> With FC AL connections and four cards, it would be possible to approach
> 1000 drives.
>
> Of course, I'm not sure how fast any RAID card setup is gonna be with
> that  many drives, but ya never know.  My guess is that before you go
> there you  buy a big external RAID box built for speed.  We have a
> couple of 200+  drive external RAID5 storage boxes at work that are
> quite impressive.

That's a good point.  But it seems that the databases that are the
leaders of the TPC numbers seem to be the Oracles of the world.  I
know that a former company I worked for publised TPC numbers using
Oracle with Raw I/O to get the performance up.

However, it would be interesting for us to conduct a small scale
test using a couple of HW Raid systems configured so that a single
file system can be mounted, then run the OSDL dbt workloads.  The
resluts could then be compared with current results that have been
captured.
>
>> Some of the problem can be addressed by using a volume manager (such
>> as LVM in Linux, or Veritas on Unix-like systems).  This allows one to
>> create a volume using partitions from many disks.  One can then create
>> a file system and mount it on the volume.
>
> Pretty much RAID arrays in software, which means no battery backed
> cache,  which means it'll be fast at reading, but probably pretty slow
> at writes,  epsecially if there's a lot of parallel access waiting to
> write to the  database.
>
>> However, to get the best performance, Raw I/O capability is the best
>> way to go.
>
> Unsupported statement made as fact.  I'm not saying it can't or isn't
> true, but my experience has been that large RAID5 arrays are a great
> compromise between maximum performance and reliability, giving a good
> measure of each.  It doesn't take 100 drives to do well, even a dozen to
>  two dozen will get you in the same basic range as splitting out files
> by  hand with sym links without all the headache of chasing down the
> files,  shutting down the database, linking it over etc...

Whoops, you're right.  I was typing faster than I was thinking.  I was
assuming a JBOD set up rather than a RAID storage subsystem.  SAN units
such as an EMC or Shark usualy have 4-16 GB cache and thus the I/O's
go pretty quick for really large databases.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org




In response to

Responses

pgsql-performance by date

Next:From: scott.marloweDate: 2004-02-17 18:23:09
Subject: Re: Tables on multiple disk drives
Previous:From: scott.marloweDate: 2004-02-17 17:53:04
Subject: Re: Tables on multiple disk drives

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group