Re: 8K recordsize bad on ZFS?

From: Jignesh Shah <jkshah(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: 8K recordsize bad on ZFS?
Date: 2010-05-08 21:39:02
Message-ID: AANLkTimjfVo4s6TKu7-1BN5B-Wn4m3SnsUduRmaDxFAV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, May 7, 2010 at 8:09 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Jignesh, All:
>
> Most of our Solaris users have been, I think, following Jignesh's advice
> from his benchmark tests to set ZFS page size to 8K for the data zpool.
>  However, I've discovered that this is sometimes a serious problem for
> some hardware.
>
> For example, having the recordsize set to 8K on a Sun 4170 with 8 drives
> recently gave me these appalling Bonnie++ results:
>
> Version  1.96       ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency   4     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> db111           24G           260044  33 62110  17           89914  15
> 1167  25
> Latency                        6549ms    4882ms              3395ms
> 107ms
>
> I know that's hard to read.  What it's saying is:
>
> Seq Writes: 260mb/s combined
> Seq Reads: 89mb/s combined
> Read Latency: 3.3s
>
> Best guess is that this is a result of overloading the array/drives with
> commands for all those small blocks; certainly the behavior observed
> (stuttering I/O, latency) is in line with that issue.
>
> Anyway, since this is a DW-like workload, we just bumped the recordsize
> up to 128K and the performance issues went away ... reads up over 300mb/s.
>
> --
>                                  -- Josh Berkus
>                                     PostgreSQL Experts Inc.
>                                     http://www.pgexperts.com
>

Hi Josh,

The 8K recommendation is for OLTP Applications.. So if you seen
somewhere to use it for DSS/DW workload then I need to change it. DW
Workloads require throughput and if they use 8K then they are limited
by 8K x max IOPS which with 8 disk is about 120 (typical) x 8 SAS
drives which is roughly about 8MB/sec.. (Prefetching with read drives
and other optimizations can help it to push to about 24-30MB/sec with
8K on 12 disk arrays).. So yes that advice is typically bad for DSS..
And I believe I generally recommend them to use 128KB for DSS.So if
you have seen the 8K for DSS let me know and hopefully if I still have
access to it I can change it. However for OLTP you are generally want
more IOPS with low latency which is what 8K provides (The smallest
blocksize in ZFS).

Hope this clarifies.

-Jignesh

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jesper Krogh 2010-05-09 06:53:49 Re: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?
Previous Message Karl Denninger 2010-05-08 20:35:19 Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?