Re: H800 + md1200 Performance problem

From: Cesar Martin <cmartinp(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: H800 + md1200 Performance problem
Date: 2012-04-04 09:42:20
Message-ID: CAMAsR=7onjeWr--PtgHgfZv=yYSB8FVxf1BsYSwu2752YY0Q8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

Yesterday I changed the kernel setting, that said
Scott, vm.zone_reclaim_mode = 0. I have done new benchmarks and I have
noticed changes at least in Postgres:

First exec:
EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on company_news_internet_201111 (cost=0.00..369577.79
rows=6765779 width=323) (actual time=0.020..7984.707 rows=6765779 loops=1)
Total runtime: 12699.008 ms
(2 filas)

Second:
EXPLAIN ANALYZE SELECT * from company_news_internet_201111;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on company_news_internet_201111 (cost=0.00..369577.79
rows=6765779 width=323) (actual time=0.023..1767.440 rows=6765779 loops=1)
Total runtime: 2696.901 ms

It seems that now data is being cached right...

The large query in first exec takes 80 seconds and in second exec takes
around 23 seconds. This is not spectacular but is better than yesterday.

Furthermore the results of dd are strange:

dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 803,738 s, 171 MB/s

171 MB/s I think is bad value for 12 SAS RAID10... And when I execute
iostat during the dd execution i obtain results like:
sdc 1514,62 0,01 108,58 11 117765
sdc 3705,50 0,01 316,62 0 633
sdc 2,00 0,00 0,05 0 0
sdc 920,00 0,00 63,49 0 126
sdc 8322,50 0,03 712,00 0 1424
sdc 6662,50 0,02 568,53 0 1137
sdc 0,00 0,00 0,00 0 0
sdc 1,50 0,00 0,04 0 0
sdc 6413,00 0,01 412,28 0 824
sdc 13107,50 0,03 867,94 0 1735
sdc 0,00 0,00 0,00 0 0
sdc 1,50 0,00 0,03 0 0
sdc 9719,00 0,03 815,49 0 1630
sdc 2817,50 0,01 272,51 0 545
sdc 1,50 0,00 0,05 0 0
sdc 1181,00 0,00 71,49 0 142
sdc 7225,00 0,01 362,56 0 725
sdc 2973,50 0,01 269,97 0 539

I don't understand why MB_wrtn/s go from 0 to near 800MB/s constantly
during execution.

Read results:

dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=16384
16384+0 records in
16384+0 records out
137438953472 bytes (137 GB) copied, 257,626 s, 533 MB/s

sdc 3157,00 392,69 0,00 785 0
sdc 3481,00 432,75 0,00 865 0
sdc 2669,50 331,50 0,00 663 0
sdc 3725,50 463,75 0,00 927 0
sdc 2998,50 372,38 0,00 744 0
sdc 3600,50 448,00 0,00 896 0
sdc 3588,00 446,50 0,00 893 0
sdc 3494,00 434,50 0,00 869 0
sdc 3141,50 390,62 0,00 781 0
sdc 3667,50 456,62 0,00 913 0
sdc 3429,35 426,18 0,00 856 0
sdc 3043,50 378,06 0,00 756 0
sdc 3366,00 417,94 0,00 835 0
sdc 3480,50 432,62 0,00 865 0
sdc 3523,50 438,06 0,00 876 0
sdc 3554,50 441,88 0,00 883 0
sdc 3635,00 452,19 0,00 904 0
sdc 3107,00 386,20 0,00 772 0
sdc 3695,00 460,00 0,00 920 0
sdc 3475,50 432,11 0,00 864 0
sdc 3487,50 433,50 0,00 867 0
sdc 3232,50 402,39 0,00 804 0
sdc 3698,00 460,67 0,00 921 0
sdc 5059,50 632,00 0,00 1264 0
sdc 3934,00 489,56 0,00 979 0
sdc 4536,50 566,75 0,00 1133 0
sdc 5298,00 662,12 0,00 1324 0

Here results I think that are more logical. Read speed is maintained along
all the test...

About the parameter "conv=fdatasync" that mention Tomas, I saw it at
http://romanrm.ru/en/dd-benchmark and I started to use but is possible
wrong. Before I used time sh -c "dd if=/dev/zero of=ddfile bs=X count=Y &&
sync".

What is your opinion about the results??

I have noticed that since I changed the setting vm.zone_reclaim_mode = 0,
swap is totally full. Do you recommend me disable swap?

Thanks!!

El 3 de abril de 2012 20:01, Tomas Vondra <tv(at)fuzzy(dot)cz> escribió:

> On 3.4.2012 17:42, Cesar Martin wrote:
> > Yes, setting is the same in both machines.
> >
> > The results of bonnie++ running without arguments are:
> >
> > Version 1.96 ------Sequential Output------ --Sequential Input-
> > --Random-
> > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> > --Seeks--
> > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> > /sec %CP
> > cltbbdd01 126G 94 99 202873 99 208327 95 1639 91 819392 88
> > 2131 139
> > Latency 88144us 228ms 338ms 171ms 147ms
> > 20325us
> > ------Sequential Create------ --------Random
> > Create--------
> > -Create-- --Read--- -Delete-- -Create-- --Read---
> > -Delete--
> > files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> > /sec %CP
> > cltbbdd01 16 8063 26 +++++ +++ 27361 96 31437 96 +++++ +++
> > +++++ +++
> > Latency 7850us 2290us 2310us 530us 11us
> > 522us
> >
> > With DD, one core of CPU put at 100% and results are about 100-170
> > MBps, that I thing is bad result for this HW:
> >
> > dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=100
> > 100+0 records in
> > 100+0 records out
> > 838860800 bytes (839 MB) copied, 8,1822 s, 103 MB/s
> >
> > dd if=/dev/zero of=/vol02/bonnie/DD bs=8M count=1000 conv=fdatasync
> > 1000+0 records in
> > 1000+0 records out
> > 8388608000 bytes (8,4 GB) copied, 50,8388 s, 165 MB/s
> >
> > dd if=/dev/zero of=/vol02/bonnie/DD bs=1M count=1024 conv=fdatasync
> > 1024+0 records in
> > 1024+0 records out
> > 1073741824 bytes (1,1 GB) copied, 7,39628 s, 145 MB/s
> >
> > When monitor I/O activity with iostat, during dd, I have noticed that,
> > if the test takes 10 second, the disk have activity only during last 3
> > or 4 seconds and iostat report about 250-350MBps. Is it normal?
>
> Well, you're testing writing, and the default behavior is to write the
> data into page cache. And you do have 64GB of RAM so the write cache may
> take large portion of the RAM - even gigabytes. To really test the I/O
> you need to (a) write about 2x the amount of RAM or (b) tune the
> dirty_ratio/dirty_background_ratio accordingly.
>
> BTW what are you trying to achieve with "conv=fdatasync" at the end. My
> dd man page does not mention 'fdatasync' and IMHO it's a mistake on your
> side. If you want to sync the data at the end, then you need to do
> something like
>
> time sh -c "dd ... && sync"
>
> > I set read ahead to different values, but the results don't differ
> > substantially...
>
> Because read-ahead is for reading (which is what a SELECT does most of
> the time), but the dests above are writing to the device. And writing is
> not influenced by read-ahead.
>
> To test reading, do this:
>
> dd if=/vol02/bonnie/DD of=/dev/null bs=8M count=1024
>
> Tomas
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
César Martín Pérez
cmartinp(at)gmail(dot)com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2012-04-04 13:15:31 Re: H800 + md1200 Performance problem
Previous Message ahchuan 2012-04-04 09:22:01 postgresql.conf setting for max_fsm_pages