Skip site navigation (1) Skip section navigation (2)

8.3.9 - latency spikes with Linux (and tuning for consistently low latency)

From: Marinos Yannikos <mjy(at)geizhals(dot)at>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: 8.3.9 - latency spikes with Linux (and tuning for consistently low latency)
Date: 2010-04-15 16:46:00
Message-ID: 4BC742C8.2050301@geizhals.at (view raw or flat)
Thread:
Lists: pgsql-performance
Hi,

we are seeing latency spikes in the 2-3 second range (sometimes 8-10s) for 
queries that usually take 3-4ms on our systems and I am running out of things to 
try to get rid of them. Perhaps someone here has more ideas - here's a 
description of the systems and what I've tried with no impact at all:

2 x 6-core Opterons (2431)
32GB RAM
2 SATA disks (WD1500HLFS) in software RAID-1
Linux 2.6.26 64 bit (Debian kernel)
PostgreSQL 8.3.9 (Debian package)
FS mounted with option noatime
vm.dirty_ratio = 80

3 DB clusters, 2 of which are actively used, all on the same RAID-1 FS
fsync=off
shared_buffers=5GB (database size is ~4.7GB on disk right now)
temp_buffers=50MB
work_mem=500MB
wal_buffers=256MB (*)
checkpoint_segments=256 (*)
commit_delay=100000 (*)
autovacuum=off (*)

(*) added while testing, no change w.r.t. the spikes seen at all

The databases have moderate read load (no burst load, typical web backend) and 
somewhat regular write load (updates in batches, always single-row 
update/delete/inserts using the primary key, 90% updates, a few 100s to 1000s 
rows together, without explicit transactions/locking).

This is how long the queries take (seen from the client):
Thu Apr 15 18:16:14 CEST 2010 real      0m0.004s
Thu Apr 15 18:16:15 CEST 2010 real      0m0.004s
Thu Apr 15 18:16:16 CEST 2010 real      0m0.003s
Thu Apr 15 18:16:17 CEST 2010 real      0m0.005s
Thu Apr 15 18:16:18 CEST 2010 real      0m0.068s
Thu Apr 15 18:16:19 CEST 2010 real      0m0.004s
Thu Apr 15 18:16:20 CEST 2010 real      0m0.005s
Thu Apr 15 18:16:21 CEST 2010 real      0m0.235s
Thu Apr 15 18:16:22 CEST 2010 real      0m0.005s
Thu Apr 15 18:16:23 CEST 2010 real      0m3.006s <== !
Thu Apr 15 18:16:27 CEST 2010 real      0m0.004s
Thu Apr 15 18:16:28 CEST 2010 real      0m0.084s
Thu Apr 15 18:16:29 CEST 2010 real      0m0.003s
Thu Apr 15 18:16:30 CEST 2010 real      0m0.005s
Thu Apr 15 18:16:32 CEST 2010 real      0m0.038s
Thu Apr 15 18:16:33 CEST 2010 real      0m0.005s
Thu Apr 15 18:16:34 CEST 2010 real      0m0.005s

The spikes aren't periodic, i.e. not every 10,20,30 seconds or 5 minutes etc, 
they seem completely random... PostgreSQL also reports (due to 
log_min_duration_statement=1000) small bursts of queries that take much longer 
than they should:

[nothing for a few minutes]
2010-04-15 16:50:03 CEST LOG:  duration: 8995.934 ms  statement: select ...
2010-04-15 16:50:04 CEST LOG:  duration: 3383.780 ms  statement: select ...
2010-04-15 16:50:04 CEST LOG:  duration: 3328.523 ms  statement: select ...
2010-04-15 16:50:05 CEST LOG:  duration: 1120.108 ms  statement: select ...
2010-04-15 16:50:05 CEST LOG:  duration: 1079.879 ms  statement: select ...
[nothing for a few minutes]
(explain analyze yields 5-17ms for the above queries)

Things I've tried apart from the PostgreSQL parameters above:
- switching from ext3 with default journal settings to data=writeback
- switching to ext2
- vm.dirty_background_ratio set to 1, 10, 20, 60
- vm.dirty_expire_centisecs set to 3000 (default), 8640000 (1 day)
- fsync on
- some inofficial Debian 2.6.32 kernel and ext3 with data=writeback (because of 
http://lwn.net/Articles/328363/ although it seems to address fsync latency and 
not read latency)
- running irqbalance

All these had no visible impact on the latency spikes.

I can also exclude faulty hardware with some certainty (since we have 12 
identical systems with this problem).

I am suspecting some strange software RAID or kernel problem, unless the default 
bgwriter settings can actually cause selects to get stuck for so long when there 
are too many dirty buffers (I hope not). Unless I'm missing something, I only 
have a non-RAID setup or ramdisks (tmpfs), or SSDs left to try to get rid of 
these, so any suggestion will be greatly appreciated. Generally, I'd be very 
interested in hearing how people tune their databases and their hardware/Linux 
for consistently low query latency (esp. when everything should fit in memory).

Regards,
  Marinos

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2010-04-15 17:38:00
Subject: Re: 8.3.9 - latency spikes with Linux (and tuning for consistently low latency)
Previous:From: nornDate: 2010-04-15 14:23:28
Subject: Re: significant slow down with various LIMIT

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group