Re: possible new option for wal_sync_method

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Dan Scales <scales(at)vmware(dot)com>
Subject: Re: possible new option for wal_sync_method
Date: 2012-02-16 18:32:09
Message-ID: 201202161932.09708.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thursday, February 16, 2012 06:18:23 PM Dan Scales wrote:
> When running Postgres on a single ext3 filesystem on Linux, we find that
> the attached simple patch gives significant performance benefit (7-8% in
> numbers below). The patch adds a new option for wal_sync_method, which
> is "open_direct". With this option, the WAL is always opened with
> O_DIRECT (but not O_SYNC or O_DSYNC). For Linux, the use of only
> O_DIRECT should be correct. All WAL logs are fully allocated before
> being used, and the WAL buffers are 8K-aligned, so all direct writes are
> guaranteed to complete before returning. (See
> http://lwn.net/Articles/348739/)
I don't think that behaviour is safe in the face of write caches in the IO
path. Linux takes care to issue flush/barrier instructions when necessary if
you issue an fsync/fdatasync, but to my knowledge it does not when O_DIRECT is
used (That would suck performancewise).
I think that behaviour is safe if you have no externally visible write caching
enabled but thats not exactly easy to get/document knowledge.

Why should there otherwise be any performance difference between O_DIRECT|
O_SYNC and O_DIRECT in wal write case? There is no metadata that needs to be
written and I have a hard time imaging that the check whether there is
metadata is that expensive.

I guess a more interesting case would be comparing O_DIRECT|O_SYNC with
O_DIRECT + fdatasync() or even O_DIRECT +
sync_file_range(SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE |
SYNC_FILE_RANGE_WAIT_AFTER)

Any special reason youve did that comparison on ext3? Especially with
data=ordered its behaviour regarding syncs is pretty insane performancewise.
Ext4 would be a bit more interesting...

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2012-02-16 18:57:39 Re: [trivial patch] typo in doc/src/sgml/sepgsql.sgml
Previous Message Robert Haas 2012-02-16 18:29:10 Re: patch for parallel pg_dump