Skip site navigation (1) Skip section navigation (2)

Re: SAN performance mystery

From: Tim Allen <tim(at)proximity(dot)com(dot)au>
To: pgsql-performance(at)lusis(dot)org
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: SAN performance mystery
Date: 2006-06-19 10:28:07
Message-ID: 44967C37.3030006@proximity.com.au (view raw or flat)
Thread:
Lists: pgsql-performance
John Vincent wrote:
>     <snipped>
>     Is that expected performance, anyone? It doesn't sound right to me. Does
>     anyone have any clues about what might be going on? Buggy kernel
>     drivers? Buggy kernel, come to think of it? Does a SAN just not provide
>     adequate performance for a large database?
> 
> Tim,
> 
> Here are the areas I would look at first if we're considering hardware 
> to be the problem:
> 
> HBA and driver:
>    Since this is a Intel/Linux system, the HBA is PROBABLY a qlogic. I 
> would need to know the SAN model to see what the backend of the SAN is 
> itself. EMC has some FC-attach models that actually have SATA disks 
> underneath. You also might want to look at the cache size of the 
> controllers on the SAN.

As I noted in another thread, the HBA is an Emulex LP1050, and they have 
a rather old driver for it. I've recommended that they update ASAP. This 
hasn't happened yet.

I know very little about the SAN itself - the customer hasn't provided 
any information other than the brand name, as they selected it and 
installed it themselves. I shall ask for more information.

>    - Something also to note is that EMC provides a add-on called 
> PowerPath for load balancing multiple HBAs. If they don't have this, it 
> might be worth investigating.

OK, thanks, I'll ask the customer whether they've used PowerPath at all. 
They do seem to have it installed on the machine, but I suppose that 
doesn't guarantee it's being used correctly. However, it looks like they 
have just the one HBA, so, if I've correctly understood what load 
balancing means in this context, it's not going to help; right?

>   - As with anything, disk layout is important. With the lower end IBM 
> SAN (DS4000) you actually have to operate on physical spindle level. On 
> our 4300, when I create a LUN, I select the exact disks I want and which 
> of the two controllers are the preferred path. On our DS6800, I just ask 
> for storage. I THINK all the EMC models are the "ask for storage" type 
> of scenario. However with the 6800, you select your storage across 
> extent pools.
> 
> Have they done any benchmarking of the SAN outside of postgres? Before 
> we settle on a new LUN configuration, we always do the 
> dd,umount,mount,dd routine. It's not a perfect test for databases but it 
> will help you catch GROSS performance issues.

I've done some dd'ing myself, as described in another thread. The 
results are not at all encouraging - their SAN seems to do about 20MB/s 
or less.

> SAN itself:
>   - Could the SAN be oversubscribed? How many hosts and LUNs total do 
> they have and what are the queue_depths for those hosts? With the qlogic 
> card, you can set the queue depth in the BIOS of the adapter when the 
> system is booting up. CTRL-Q I think.  If the system has enough local 
> DASD to relocate the database internally, it might be a valid test to do 
> so and see if you can isolate the problem to the SAN itself.

The SAN possibly is over-subscribed. Can you suggest any easy ways for 
me to find out? The customer has an IT department who look after their 
SANs, and they're not keen on outsiders poking their noses in. It's hard 
for me to get any direct access to the SAN itself.

> PG itself:
>  
>  If you think it's a pgsql configuration, I'm guessing you already 
> configured postgresql.conf to match thiers (or at least a fraction of 
> thiers since the memory isn't the same?). What about loading a 
> "from-scratch" config file and restarting the tuning process?

The pg configurations are not identical. However, given the differences 
in raw I/O speed observed, it doesn't seem likely that the difference in 
configuration is responsible. Yes, as you guessed, we set more 
conservative options on the less capable box. Doing proper double-blind 
tests on the customer box is difficult, as it is in production and the 
customer has a very low tolerance for downtime.

> Just a dump of my thought process from someone who's been spending too 
> much time tuning his SAN and postgres lately.

Thanks for all the suggestions, John. I'll keep trying to follow some of 
them up.

Tim

-- 
-----------------------------------------------
Tim Allen          tim(at)proximity(dot)com(dot)au
Proximity Pty Ltd  http://www.proximity.com.au/

In response to

Responses

pgsql-performance by date

Next:From: Stephen FrostDate: 2006-06-19 12:41:54
Subject: Re: SAN performance mystery
Previous:From: Michael StoneDate: 2006-06-19 10:24:32
Subject: Re: SAN performance mystery

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group