Skip site navigation (1) Skip section navigation (2)

Re: [PERFORM] Quad processor options - summary

From: Paul Tuckfield <paul(at)tuckfield(dot)com>
To: Bjoern Metzdorf <bm(at)turtle-entertainment(dot)de>
Cc: pgsql-admin(at)postgresql(dot)org,James Thornton <james(at)jamesthornton(dot)com>,pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Quad processor options - summary
Date: 2004-05-14 00:51:42
Message-ID: DD8618E1-A540-11D8-A34F-000393BD6C3E@tuckfield.com (view raw or flat)
Thread:
Lists: pgsql-adminpgsql-performance
One big caveat re. the "SAME" striping strategy, is that readahead can 
really hurt an OLTP you.

Mind you, if you're going from a few disks to a caching array with many 
disks, it'll be hard to not have a big improvement

But if you push the envelope of the array with a "SAME" configuration, 
readahead will hurt.  Readahead is good for sequential reads but bad 
for random reads, because the various caches (array and filesystem) get 
flooded with all the blocks that happen to come after whatever random 
blocks  you're reading.  Because they're random reads these extra 
blocks are genarally *not* read by subsequent queries if the database 
is large enough to be much larger than the cache itself.   Of course, 
the readahead blocks are good if you're doing sequential scans, but 
you're not doing sequential scans because it's an OLTP database, right?


So this'll probably incite flames but:
In an OLTP environment of decent size, readahead is bad.  The ideal 
would be to adjust it dynamically til optimum (likely no readahead)  if 
the array allows it, but most people are fooled by good performance of 
readahead on simple singlethreaded or small dataset tests, and get 
bitten by this under concurrent loads or large datasets.


James Thornton wrote:
>
>>> This is what I am considering the ultimate platform for postgresql:
>>>
>>> Hardware:
>>> Tyan Thunder K8QS board
>>> 2-4 x Opteron 848 in NUMA mode
>>> 4-8 GB RAM (DDR400 ECC Registered 1 GB modules, 2 for each processor)
>>> LSI Megaraid 320-2 with 256 MB cache ram and battery backup
>>> 6 x 36GB SCSI 10K drives + 1 spare running in RAID 10, split over 
>>> both channels (3 + 4) for pgdata including indexes and wal.
>> You might also consider configuring the Postgres data drives for a 
>> RAID 10 SAME configuration as described in the Oracle paper "Optimal 
>> Storage Configuration Made Easy" 
>> (http://otn.oracle.com/deploy/availability/pdf/oow2000_same.pdf). Has 
>> anyone delved into this before?
>
> Ok, if I understand it correctly the papers recommends the following:
>
> 1. Get many drives and stripe them into a RAID0 with a stripe width of 
> 1MB. I am not quite sure if this stripe width is to be controlled at 
> the application level (does postgres support this?) or if e.g. the 
> "chunk size" of the linux software driver is meant. Normally a chunk 
> size of 4KB is recommended, so 1MB sounds fairly large.
>
> 2. Mirror your RAID0 and get a RAID10.
>
> 3. Use primarily the fast, outer regions of your disks. In practice 
> this might be achieved by putting only half of the disk (the outer 
> half) into your stripe set. E.g. put only the outer 18GB of your 36GB 
> disks into the stripe set. Btw, is it common for all drives that the 
> outer region is on the higher block numbers? Or is it sometimes on the 
> lower block numbers?
>
> 4. Subset data by partition, not disk. If you have 8 disks, then don't 
> take a 4 disk RAID10 for data and the other one for log or indexes, 
> but make a global 8 drive RAID10 and have it partitioned the way that 
> data and log + indexes are located on all drives.
>
> They say, which is very interesting, as it is really contrary to what 
> is normally recommended, that it is good or better to have one big 
> stripe set over all disks available, than to put log + indexes on a 
> separated stripe set. Having one big stripe set means that the speed 
> of this big stripe set is available to all data. In practice this 
> setup is as fast as or even faster than the "old" approach.
>
> ----------------------------------------------------------------
>
> Bottom line for a normal, less than 10 disk setup:
>
> Get many disks (8 + spare), create a RAID0 with 4 disks and mirror it 
> to the other 4 disks for a RAID10. Make sure to create the RAID on the 
> outer half of the disks (setup may depend on the disk model and raid 
> controller used), leaving the inner half empty.
> Use a logical volume manager (LVM), which always helps when adding 
> disk space, and create 2 partitions on your RAID10. One for data and 
> one for log + indexes. This should look like this:
>
> ----- ----- ----- -----
> | 1 | | 1 | | 1 | | 1 |
> ----- ----- ----- -----  <- outer, faster half of the disk
> | 2 | | 2 | | 2 | | 2 |     part of the RAID10
> ----- ----- ----- -----
> |   | |   | |   | |   |
> |   | |   | |   | |   |  <- inner, slower half of the disk
> |   | |   | |   | |   |     not used at all
> ----- ----- ----- -----
>
> Partition 1 for data, partition 2 for log + indexes. All mirrored to 
> the other 4 disks not shown.
>
> If you take 36GB disks, this should end up like this:
>
> RAID10 has size of 36 / 2 * 4 = 72GB
> Partition 1 is 36 GB
> Partition 2 is 36 GB
>
> If 36GB is not enough for your pgdata set, you might consider moving 
> to 72GB disks, or (even better) make a 16 drive RAID10 out of 36GB 
> disks, which both will end up in a size of 72GB for your data (but the 
> 16 drive version will be faster).
>
> Any comments?
>
> Regards,
> Bjoern
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>      message can get through to the mailing list cleanly
>


In response to

pgsql-performance by date

Next:From: Fabio PanizzuttiDate: 2004-05-14 08:40:24
Subject: R: R: Query plan on identical tables differs . Why ?
Previous:From: James ThorntonDate: 2004-05-13 23:36:16
Subject: Re: [PERFORM] Quad processor options - summary

pgsql-admin by date

Next:From: Laurens WagemakersDate: 2004-05-14 09:41:30
Subject: Re: GNUmakefile size 0
Previous:From: James ThorntonDate: 2004-05-13 23:36:16
Subject: Re: [PERFORM] Quad processor options - summary

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group