Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jan Kara <jack(at)suse(dot)cz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Layton <jlayton(at)redhat(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dave Chinner <david(at)fromorbit(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-22 21:55:36
Message-ID: 20140122215536.GF6346@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 21, 2014 at 09:20:52PM +0100, Jan Kara wrote:
> > If we're forcing the WAL out to disk because of transaction commit or
> > because we need to write the buffer protected by a certain WAL record
> > only after the WAL hits the platter, then it's fine. But sometimes
> > we're writing WAL just because we've run out of internal buffer space,
> > and we don't want to block waiting for the write to complete. Opening
> > the file with O_SYNC deprives us of the ability to control the timing
> > of the sync relative to the timing of the write.
> O_SYNC has a heavy performance penalty. For ext4 it means an extra fs
> transaction commit whenever there's any metadata changed on the filesystem.
> Since mtime & ctime of files will be changed often, the will be a case very
> often.

Also, there is the issue of writes that don't need sycning being synced
because sync is set on the file descriptor. Here is output from our
pg_test_fsync tool when run on an SSD with a BBU:

$ pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 8424.785 ops/sec 119 usecs/op
fsync 7127.072 ops/sec 140 usecs/op
fsync_writethrough n/a
open_sync 10548.469 ops/sec 95 usecs/op

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 4367.375 ops/sec 229 usecs/op
fsync 4427.761 ops/sec 226 usecs/op
fsync_writethrough n/a
open_sync 4303.564 ops/sec 232 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
--> 1 * 16kB open_sync write 4938.711 ops/sec 202 usecs/op
--> 2 * 8kB open_sync writes 4233.897 ops/sec 236 usecs/op
--> 4 * 4kB open_sync writes 2904.710 ops/sec 344 usecs/op
--> 8 * 2kB open_sync writes 1736.720 ops/sec 576 usecs/op
--> 16 * 1kB open_sync writes 935.917 ops/sec 1068 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close 7626.783 ops/sec 131 usecs/op
write, close, fsync 6492.697 ops/sec 154 usecs/op

Non-Sync'ed 8kB writes:
write 351517.178 ops/sec 3 usecs/op

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-01-22 21:57:27 Re: pg_upgrade & tablespaces
Previous Message Jon Nelson 2014-01-22 21:49:14 Re: PoC: Duplicate Tuple Elidation during External Sort for DISTINCT