Skip site navigation (1) Skip section navigation (2)

Strange behavior: pgbench and new Linux kernels

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Strange behavior: pgbench and new Linux kernels
Date: 2008-04-17 07:58:43
Message-ID: Pine.GSO.4.64.0804170230180.26917@westnet.com (view raw or flat)
Thread:
Lists: pgsql-performance
This week I've finished building and installing OSes on some new hardware 
at home.  I have a pretty standard validation routine I go through to make 
sure PostgreSQL performance is good on any new system I work with.  Found 
a really strange behavior this time around that seems related to changes 
in Linux.  Don't expect any help here, but if someone wanted to replicate 
my tests I'd be curious to see if that can be done.  I tell the story 
mostly because I think it's an interesting tale in hardware and software 
validation paranoia, but there's a serious warning here as well for Linux 
PostgreSQL users.

The motherboard is fairly new, and I couldn't get CentOS 5.1, which ships 
with kernel 2.6.18, to install with the default settings.  I had to drop 
back to "legacy IDE" mode to install.  But it was running everything in 
old-school IDE mode, no DMA or antyhing.  "hdparm -Tt" showed a whopping 
3MB/s on reads.

I pulled down the latest (at the time--only a few hours and I'm already 
behind) Linux kernel, 2.6.24-4, and compiled that with the right modules 
included.  Now I'm getting 70MB/s on simple reads.  Everything looked fine 
from there until I got to the pgbench select-only tests running PG 8.2.7 
(I do 8.2 then 8.3 separately because the checkpoint behavior on 
write-heavy stuff is so different and I want to see both results).

Here's the regular thing I do to see how fast pgbench executes against 
things in memory (but bigger than the CPU's cache):

-Set shared_buffers=256MB, start the server
-dropdb pgbench (if it's already there)
-createdb pgbench
-pgbench -i -s 10 pgbench	(makes about a 160MB database)
-pgbench -S -c <2*cores> -t 10000 pgbench

Since the database was just written out, the whole thing will still be in 
the shared_buffers cache, so this should execute really fast.  This was an 
Intel quad-core system, I used -c 8, and that got me around 25K 
transactions/second.  Curious to see how high I could push this, I started 
stepping up the number of clients.

There's where the weird thing happened.  Just by going to 12 clients 
instead of 8, I dropped to 8.5K TPS, about 1/3 of what I get from 8 
clients.  It was like that on every test run.  When I use 10 clients, it's 
about 50/50; sometimes I get 25K, sometimes 8.5K.  The only thing it 
seemed to correlate with is that vmstat on the 25K runs showed ~60K 
context switches/second, while the 8.5K ones had ~44K.

Since I've never seen this before, I went back to my old benchmark system 
with a dual-core AMD processor.  That started with CentOS 4 and kernel 
2.6.9, but I happened to install kernel 2.6.24-3 on there to get better 
support for my Areca card (it goes bonkers regularly on x64 2.6.9). 
Never did a thorough perforance test of the new kernel though.  Sure 
enough, the same behavior was there, except without a flip-flop point, 
just a sharp decline.  Check this out:

-bash-3.00$ pgbench -S -c 8 -t 10000 pgbench | grep excluding
tps = 15787.684067 (excluding connections establishing)
tps = 15551.963484 (excluding connections establishing)
tps = 14904.218043 (excluding connections establishing)
tps = 15330.519289 (excluding connections establishing)
tps = 15606.683484 (excluding connections establishing)

-bash-3.00$ pgbench -S -c 12 -t 10000 pgbench | grep excluding
tps = 7593.572749 (excluding connections establishing)
tps = 7870.053868 (excluding connections establishing)
tps = 7714.047956 (excluding connections establishing)

Results are consistant, right?  Summarizing that and extending out, here's 
what the median TPS numbers look like with 3 tests at each client load:

-c4:  16621	(increased -t to 20000 here)
-c8:  15551	(all these with t=10000)
-c9:  13269
-c10:  10832
-c11:  8993
-c12:  7714
-c16:  7311
-c32:  7141	(cut -t to 5000 here)

Now, somewhere around here I start thinking about CPU cache coherency, I 
play with forcing tasks to particular CPUs, I try the deadline scheduler 
instead of the default CFQ, but nothing makes a difference.

Wanna guess what did?  An earlier kernel.  These results are the same test 
as above, same hardware, only difference is I used the standard CentOS 4 
2.6.9-67.0.4 kernel instead of 2.6.24-3.

-c4:  18388
-c8:  15760
-c9:  15814	(one result of 12623)
-c12: 14339 	(one result of 11105)
-c16:  14148
-c32:  13647	(one result of 10062)

We get the usual bit of pgbench flakiness, but using the earlier kernel is 
faster in every case, only degrades slowly as clients increase, and is 
almost twice as fast here in a typical high-client load case.

So in the case of this simple benchmark, I see an enormous performance 
regression from the newest Linux kernel compared to a much older one.  I 
need to do some version bisection to nail it down for sure, but my guess 
is it's the change to the Completely Fair Scheduler in 2.6.23 that's to 
blame.  The recent FreeBSD 7.0 PostgreSQL benchmarks at 
http://people.freebsd.org/~kris/scaling/7.0%20and%20beyond.pdf showed an 
equally brutal performance drop going from 2.6.22 to 2.6.23 (see page 16) 
in around the same client load on a read-only test.  My initial guess is 
that I'm getting nailed by a similar issue here.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

Responses

pgsql-performance by date

Next:From: Richard HuxtonDate: 2008-04-17 08:15:04
Subject: Re: db size
Previous:From: Adrian MoiseyDate: 2008-04-17 06:28:42
Subject: Re: db size

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group