Re: Two identical systems, radically different performance

From: Craig James <cjames(at)emolecules(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Two identical systems, radically different performance
Date: 2012-10-08 22:29:17
Message-ID: CAFwQ8rfm2AcuoFHDfBLA_hg7ffbsYNsrsaJPDqcOeGqaVW02OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

One mistake in my descriptions...

On Mon, Oct 8, 2012 at 2:45 PM, Craig James <cjames(at)emolecules(dot)com> wrote:

> This is driving me crazy. A new server, virtually identical to an old
> one, has 50% of the performance with pgbench. I've checked everything I
> can think of.
>
> The setups (call the servers "old" and "new"):
>
> old: 2 x 4-core Intel Xeon E5620
> new: 4 x 4-core Intel Xeon E5606
>

Actually it's not 16 cores. It's 8 cores, hyperthreaded. Hyperthreading
is disabled on the old system.

Is that enough to make this radical difference? (The server is at a
co-location site, so I have to go down there to boot into the BIOS and
disable hyperthreading.)

Craig

>
> both:
>
> memory: 12 GB DDR EC
> Disks: 12x500GB disks (Western Digital 7200RPM SATA)
> 2 disks, RAID1: OS (ext4) and postgres xlog (ext2)
> 8 disks, RAID10: $PGDATA
>
> 3WARE 9650SE-12ML with battery-backed cache. The admin tool (tw_cli)
> indicates that the battery is charged and the cache is working on both
> units.
>
> Linux: 2.6.32-41-server #94-Ubuntu SMP (new server's disk was
> actually cloned from old server).
>
> Postgres: 8.4.4 (yes, I should update. But both are identical.)
>
> The postgres.conf files are identical; diffs from the original are:
>
> max_connections = 500
> shared_buffers = 1000MB
> work_mem = 128MB
> synchronous_commit = off
> full_page_writes = off
> wal_buffers = 256kB
> checkpoint_segments = 30
> effective_cache_size = 4GB
> track_activities = on
> track_counts = on
> track_functions = none
> autovacuum = on
> autovacuum_naptime = 5min
> escape_string_warning = off
>
> Note that the old server is in production and was serving a light load
> while this test was running, so in theory it should be slower, not faster,
> than the new server.
>
> pgbench: Old server
>
> pgbench -i -s 100 -U test
> pgbench -U test -c ... -t ...
>
> -c -t TPS
> 5 20000 3777
> 10 10000 2622
> 20 5000 3759
> 30 3333 5712
> 40 2500 5953
> 50 2000 6141
>
> New server
> -c -t TPS
> 5 20000 2733
> 10 10000 2783
> 20 5000 3241
> 30 3333 2987
> 40 2500 2739
> 50 2000 2119
>
> As you can see, the new server is dramatically slower than the old one.
>
> I tested both the RAID10 data disk and the RAID1 xlog disk with bonnie++.
> The xlog disks were almost identical in performance. The RAID10 pg-data
> disks looked like this:
>
> Old server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xenon 24064M 687 99 203098 26 81904 16 3889 96 403747 31
> 737.6 31
> Latency 20512us 469ms 394ms 21402us 396ms
> 112ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> xenon -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 15953 27 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 43291us 857us 519us 1588us 37us
> 178us
>
> 1.96,1.96,xenon,1,1349726125,24064M,,687,99,203098,26,81904,16,3889,96,403747,31,737.6,31,16,,,,,15953,27,+++++,+++,+++++,++\
>
> +,+++++,+++,+++++,+++,+++++,+++,20512us,469ms,394ms,21402us,396ms,112ms,43291us,857us,519us,1588us,37us,178us
>
>
> New server:
> Version 1.96 ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> zinc 24064M 862 99 212143 54 96008 14 4921 99 279239 17
> 752.0 23
> Latency 15613us 598ms 597ms 2764us 398ms
> 215ms
> Version 1.96 ------Sequential Create------ --------Random
> Create--------
> zinc -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> /sec %CP
> 16 20380 26 +++++ +++ +++++ +++ +++++ +++ +++++ +++
> +++++ +++
> Latency 487us 627us 407us 972us 29us
> 262us
>
> 1.96,1.96,zinc,1,1349722017,24064M,,862,99,212143,54,96008,14,4921,99,279239,17,752.0,23,16,,,,,20380,26,+++++,+++,+++++,+++\
>
> ,+++++,+++,+++++,+++,+++++,+++,15613us,598ms,597ms,2764us,398ms,215ms,487us,627us,407us,972us,29us,262us
>
> I don't know enough about bonnie++ to know if these differences are
> interesting.
>
> One dramatic difference I noted via vmstat. On the old server, the I/O
> load during the bonnie++ run was steady, like this:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 2 71800 2117612 17940 9375660 0 0 82948 81944 1992 1341 1 3
> 86 10
> 0 2 71800 2113328 17948 9383896 0 0 76288 75806 1751 1167 0 2
> 86 11
> 0 1 71800 2111004 17948 9386540 92 0 93324 94232 2230 1510 0 4
> 86 10
> 0 1 71800 2106796 17948 9387436 114 0 67698 67588 1572 1088 0 2
> 87 11
> 0 1 71800 2106724 17956 9387968 50 0 81970 85710 1918 1287 0 3
> 86 10
> 1 1 71800 2103304 17956 9390700 0 0 92096 92160 1970 1194 0 4
> 86 10
> 0 2 71800 2103196 17976 9389204 0 0 70722 69680 1655 1116 1 3
> 86 10
> 1 1 71800 2099064 17980 9390824 0 0 57346 57348 1357 949 0 2
> 87 11
> 0 1 71800 2095596 17980 9392720 0 0 57344 57348 1379 987 0 2
> 86 12
>
> But the new server varied wildly during bonnie++:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us sy id
> wa
> 0 1 0 4518352 12004 7167000 0 0 118894 120838 2613 1539 0
> 2 93 5
> 0 1 0 4517252 12004 7167824 0 0 52116 53248 1179 793 0
> 1 94 5
> 0 1 0 4515864 12004 7169088 0 0 46764 49152 1104 733 0
> 1 91 7
> 0 1 0 4515180 12012 7169764 0 0 32924 30724 750 542 0
> 1 93 6
> 0 1 0 4514328 12016 7170780 0 0 42188 45056 1019 664 0
> 1 90 9
> 0 1 0 4513072 12016 7171856 0 0 67528 65540 1487 993 0
> 1 96 4
> 0 1 0 4510852 12016 7173160 0 0 56876 57344 1358 942 0
> 1 94 5
> 0 1 0 4500280 12044 7179924 0 0 91564 94220 2505 2504 1
> 2 91 6
> 0 1 0 4495564 12052 7183492 0 0 102660 104452 2289 1473 0
> 2 92 6
> 0 1 0 4492092 12052 7187720 0 0 98498 96274 2140 1385 0
> 2 93 5
> 0 1 0 4488608 12060 7190772 0 0 97628 100358 2176 1398 0
> 1 94 4
> 1 0 0 4485880 12052 7192600 0 0 112406 114686 2461 1509 0
> 3 90 7
> 1 0 0 4483424 12052 7195612 0 0 64678 65536 1449 948 0
> 1 91 8
> 0 1 0 4480252 12052 7199404 0 0 99608 100356 2217 1452 0
> 1 96 3
>
> Any ideas where to look next would be greatly appreciated.
>
> Craig
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Evgeny Shishkin 2012-10-08 22:33:56 Re: Two identical systems, radically different performance
Previous Message Imre Samu 2012-10-08 22:28:37 Re: Two identical systems, radically different performance