Skip site navigation (1) Skip section navigation (2)

Re: O_DIRECT for WAL writes

From: Neil Conway <neilc(at)samurai(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: O_DIRECT for WAL writes
Date: 2005-05-30 06:29:40
Message-ID: 1117434580.23266.31.camel@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
On Mon, 2005-05-30 at 10:59 +0900, ITAGAKI Takahiro wrote:
> Yes, I've tested pgbench and dbt2 and their performances have improved.
> The two results are as follows:
> 
> 1. pgbench -s 100 on one Pentium4, 1GB mem, 2 ATA disks, Linux 2.6.8
>    (attached image)
>   tps  | wal_sync_method
> -------+-------------------------------------------------------
>  147.0 | open_direct + write multipage (previous patch)
>  147.2 | open_direct (this patch)
>  109.9 | open_sync

I'm surprised this makes as much of a difference as that benchmark would
suggest. I wonder if we're benchmarking the right thing, though: is
opening a file with O_DIRECT sufficient to ensure that a write(2) does
not return until the data has hit disk? (As would be the case with
O_SYNC.) O_DIRECT means the OS will attempt to minimize caching, but
that is not necessarily the same thing: for example, I can imagine an
implementation in which the kernel would submit the appropriate I/O to
the disk when it sees a write(2) on a file opened with O_DIRECT, but
then let the write(2) return before getting confirmation from the disk
that the I/O has succeeded or failed. From googling, the MySQL
documentation for innodb_flush_method notes:

        This option is only relevant on Unix systems. If set to
        fdatasync, InnoDB uses fsync() to flush both the data and log
        files. If set to O_DSYNC, InnoDB uses O_SYNC to open and flush
        the log files, but uses fsync() to flush the datafiles. If
        O_DIRECT is specified (available on some GNU/Linux versions
        starting from MySQL 4.0.14), InnoDB uses O_DIRECT to open the
        datafiles, and uses fsync() to flush both the data and log
        files.
        
That would suggest O_DIRECT by itself is not sufficient to force a flush
to disk -- if anyone has some more definitive evidence that would be
welcome.

Anyway, if the above is true, we'll need to use O_DIRECT as well as one
of the existing wal_sync_methods.

BTW, from the patch:

+ /* TODO: Aligment depends on OS and filesystem. */
+ #define O_DIRECT_BUFFER_ALIGN	4096

I suppose there's no reasonable way to autodetect this, so we'll need to
expose it as a GUC variable (or perhaps a configure option), which is a
bit unfortunate.

-Neil



In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2005-05-30 06:52:09
Subject: Re: O_DIRECT for WAL writes
Previous:From: Mark KirkwoodDate: 2005-05-30 04:19:29
Subject: Re: pg_buffercache causes assertion failure

pgsql-patches by date

Next:From: Tom LaneDate: 2005-05-30 06:52:09
Subject: Re: O_DIRECT for WAL writes
Previous:From: Mark KirkwoodDate: 2005-05-30 04:19:29
Subject: Re: pg_buffercache causes assertion failure

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group