Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-03 20:25:02
Message-ID: 4CF9521E.2090708@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

All,

So, this week I've had my hands on a medium-high-end test system where I
could test various wal_sync_methods. This is a 24-core Intel Xeon
machine with 72GB of ram, and 8 internal 10K SAS disks attached to a
raid controller with 512MB BBU write cache. 2 of the disks are in a
RAID1, which supports both an Ext4 partition and an XFS partition. The
remaining disks are in a RAID10 which only supports a single pgdata
partition.

This is running on RHEL6, Linux Kernel: 2.6.32-71.el6.x86_64

I think this kind of a system much better represents our users who are
performance-conscious than testing on people's laptops or on VMs does.

I modified test_fsync in two ways to run this; first, to make it support
O_DIRECT, and second to make it run in the *current* directory. I think
the second change should be permanent; I imagine that a lot of people
who are running test_fsync are not aware that they're actually testing
the performance of /var/tmp, not whatever FS mount they wanted to test.

Here's the results. I think you'll agree that, at least on Linux, the
benefits of o_sync and o_dsync as defaults would be highly questionable.
Particularly, it seems that if O_DIRECT support is absent, fdatasync is
across-the-board faster:

=============

test_fsync with directIO, on 2 drives, XFS tuned:

Loops = 10000

Simple write:
8k write 198629.457/second

Compare file sync methods using one write:
open_datasync 8k write 14798.263/second
open_sync 8k write 14316.864/second
8k write, fdatasync 12198.871/second
8k write, fsync 12371.843/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 7362.805/second
2 open_sync 8k writes 7156.685/second
8k write, 8k write, fdatasync 10613.525/second
8k write, 8k write, fsync 10597.396/second

Compare open_sync with different sizes:
open_sync 16k write 13631.816/second
2 open_sync 8k writes 7645.038/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 11427.096/second
8k write, close, fsync 11321.220/second

test_fsync with directIO, on 6 drives RAID10, XFS tuned:

Loops = 10000

Simple write:
8k write 196494.537/second

Compare file sync methods using one write:
open_datasync 8k write 14909.974/second
open_sync 8k write 14559.326/second
8k write, fdatasync 11046.025/second
8k write, fsync 11046.916/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 7349.223/second
2 open_sync 8k writes 7667.395/second
8k write, 8k write, fdatasync 9560.495/second
8k write, 8k write, fsync 9557.287/second

Compare open_sync with different sizes:
open_sync 16k write 12060.049/second
2 open_sync 8k writes 7650.746/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 9377.107/second
8k write, close, fsync 9251.233/second

test_fsync without directIO on RAID1, Ext4, data=journal:

Loops = 10000

Simple write:
8k write 150514.005/second

Compare file sync methods using one write:
open_datasync 8k write 4012.070/second
open_sync 8k write 5476.898/second
8k write, fdatasync 5512.649/second
8k write, fsync 5803.814/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 2910.401/second
2 open_sync 8k writes 2817.377/second
8k write, 8k write, fdatasync 5041.608/second
8k write, 8k write, fsync 5155.248/second

Compare open_sync with different sizes:
open_sync 16k write 4895.956/second
2 open_sync 8k writes 2720.875/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 4724.052/second
8k write, close, fsync 4694.776/second

test_fsync without directIO on RAID1, XFS, tuned:

Loops = 10000

Simple write:
8k write 199796.208/second

Compare file sync methods using one write:
open_datasync 8k write 12553.525/second
open_sync 8k write 12535.978/second
8k write, fdatasync 12268.298/second
8k write, fsync 12305.875/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 6323.835/second
2 open_sync 8k writes 6285.169/second
8k write, 8k write, fdatasync 10893.756/second
8k write, 8k write, fsync 10752.607/second

Compare open_sync with different sizes:
open_sync 16k write 11053.510/second
2 open_sync 8k writes 6293.270/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 11087.482/second
8k write, close, fsync 11157.477/second

test_fsync without directIO on RAID10, 6 drives, XFS Tuned:

Loops = 10000

Simple write:
8k write 197262.003/second

Compare file sync methods using one write:
open_datasync 8k write 12784.699/second
open_sync 8k write 12684.512/second
8k write, fdatasync 12404.547/second
8k write, fsync 12452.757/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 6376.587/second
2 open_sync 8k writes 6364.113/second
8k write, 8k write, fdatasync 9895.699/second
8k write, 8k write, fsync 9866.886/second

Compare open_sync with different sizes:
open_sync 16k write 10156.491/second
2 open_sync 8k writes 6400.889/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 11142.620/second
8k write, close, fsync 11076.393/second

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-12-03 20:27:57 Re: Patch to add a primary key using an existing index
Previous Message Heikki Linnakangas 2010-12-03 20:04:01 Re: Patch to add a primary key using an existing index