Re: Tables on multiple disk drives

From: Craig Thomas <craiger(at)osdl(dot)org>
To: <scott(dot)marlowe(at)ihs(dot)com>
Cc: <craiger(at)osdl(dot)org>, <lists2(at)tokar(dot)ru>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Tables on multiple disk drives
Date: 2004-02-17 18:16:22
Message-ID: 64865.4.5.9.170.1077041782.squirrel@www.osdl.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> On Tue, 17 Feb 2004, Craig Thomas wrote:
>
>> > On Tue, 17 Feb 2004, Konstantin Tokar wrote:
>> >
>> >> Hi!
>> >> Does PostgreSQL allow to create tables and indices of a single
>> database on multiple disk drives with a purpose of increase
>> >> performance as Oracle database does? If a symbolic reference is the
>> only method then the next question is: how can it be determined
>> what file is referred to what table and index?
>> >
>> > You're life will be simpler, and your setup will be faster without
>> having to muck about with it, if you just buy a good RAID
>> controller with battery backed cache. LSI/Megaraid and Adaptec
>> both make serviceable controllers for reasonable prices, and as you
>> add drives, the speed just goes up, no muddling around with sym
>> links.
>>
>> This works to a limited extent. For very large databases, maximum
>> throughput of I/O is the paramount factor for database performance.
>> With raid controllers, your LUN is still limited to a small number of
>> disks. PostgreSQL can only write on a file system, but Oracle, SAP DB,
>> DB2, etc can write directly to disk (raw I/O). With large databases
>> it is advantageous to spread a table across 100's of disks, if the
>> table is quite large. I don't know of any manufacturer that creates a
>> 100 disk raid array yet.
>
> You can run up to four LSI / Megaraids in one box, each with 3 UW SCSI
> interfaces, and they act as one unit. That's 3*4*15 = 180 disks max.
>
> With FC AL connections and four cards, it would be possible to approach
> 1000 drives.
>
> Of course, I'm not sure how fast any RAID card setup is gonna be with
> that many drives, but ya never know. My guess is that before you go
> there you buy a big external RAID box built for speed. We have a
> couple of 200+ drive external RAID5 storage boxes at work that are
> quite impressive.

That's a good point. But it seems that the databases that are the
leaders of the TPC numbers seem to be the Oracles of the world. I
know that a former company I worked for publised TPC numbers using
Oracle with Raw I/O to get the performance up.

However, it would be interesting for us to conduct a small scale
test using a couple of HW Raid systems configured so that a single
file system can be mounted, then run the OSDL dbt workloads. The
resluts could then be compared with current results that have been
captured.
>
>> Some of the problem can be addressed by using a volume manager (such
>> as LVM in Linux, or Veritas on Unix-like systems). This allows one to
>> create a volume using partitions from many disks. One can then create
>> a file system and mount it on the volume.
>
> Pretty much RAID arrays in software, which means no battery backed
> cache, which means it'll be fast at reading, but probably pretty slow
> at writes, epsecially if there's a lot of parallel access waiting to
> write to the database.
>
>> However, to get the best performance, Raw I/O capability is the best
>> way to go.
>
> Unsupported statement made as fact. I'm not saying it can't or isn't
> true, but my experience has been that large RAID5 arrays are a great
> compromise between maximum performance and reliability, giving a good
> measure of each. It doesn't take 100 drives to do well, even a dozen to
> two dozen will get you in the same basic range as splitting out files
> by hand with sym links without all the headache of chasing down the
> files, shutting down the database, linking it over etc...

Whoops, you're right. I was typing faster than I was thinking. I was
assuming a JBOD set up rather than a RAID storage subsystem. SAN units
such as an EMC or Shark usualy have 4-16 GB cache and thus the I/O's
go pretty quick for really large databases.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message scott.marlowe 2004-02-17 18:23:09 Re: Tables on multiple disk drives
Previous Message scott.marlowe 2004-02-17 17:53:04 Re: Tables on multiple disk drives