Skip site navigation (1) Skip section navigation (2)

PSA: New Kernels and intel_idle cpuidle Driver!

From: Shaun Thomas <sthomas(at)optionshouse(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: PSA: New Kernels and intel_idle cpuidle Driver!
Date: 2012-10-26 17:58:57
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Hey guys,

I have a pretty nasty heads-up. If you have hardware using an Intel XEON 
and a newer Linux kernel, you may be experiencing very high CPU latency. 
You can check yourself:

cat /sys/devices/system/cpu/cpuidle/current_driver

If it says intel_idle, the Linux kernel will *aggressively* put your CPU 
to sleep. We definitely noticed this, and it's pretty darn painful. But 
it's *more* painful in your asynchronous, standby, or otherwise less 
busy nodes. Why?

As you can imagine, the secondary nodes don't get much activity, so 
spend most of their time sleeping. Now the CPU has a lot more sleep 
time, and wake latency while trying to copy data or process new WAL traffic.

To fix this, you must actually hint to, or outright disable, the driver 
by picking your own C-state, probably the one you wanted in the BIOS in 
the first place. We did this by adding the following options to 
GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, but your distro may differ.

intel_idle.max_cstate=0 processor.max_cstate=0 idle=mwait

Then reboot. Here are the benefits we got:

* %util difference between backing device and DRBD went down by 30-40% 
on our replicating nodes.
* TCP RTT is almost 10x faster.

I'm totally not kidding about that last one. Due to the time necessary 
to wake a CPU to handle the network traffic, latency was massively 
increased using the intel_idle driver. Our RTT average was 0.375ms on a 
10G link before. Now it's 0.04ms after using the settings above.

Consider this a PSA. DRBD is unfairly being blamed for bad performance 
with the intel_idle cpuidle driver in newer kernels! If you have DRBD on 
a newer Intel system, I highly recommend you make the above changes, 
especially since it directly affects your replication speed.

It took us days to figure this out, so I figured I'd share.

Thanks, everyone!

Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604


See for terms and conditions related to this email

pgsql-performance by date

Next:From: Jeff JanesDate: 2012-10-26 18:00:55
Subject: Re: Query-Planer from 6seconds TO DAYS
Previous:From: Jeff JanesDate: 2012-10-26 17:58:19
Subject: Re: Query-Planer from 6seconds TO DAYS

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group