Re: Load distributed checkpoint

From: "Takayuki Tsunakawa" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Load distributed checkpoint
Date: 2006-12-11 12:18:26
Message-ID: 039501c71d1e$75e4c5e0$19527c0a@OPERAO
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hello,

From: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
"Takayuki Tsunakawa" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
>> I'm afraid it is difficult for system designers to expect steady
>> throughput/response time, as long as PostgreSQL depends on the
>> flushing of file system cache. How does Oracle provide stable
>> performance?
>> Though I'm not sure, isn't it the key to use O_SYNC so that
write()s
>> transfer data to disk?
>
> AFAIK, other databases use write() and fsync() in combination. They
call
> fsync() immediately after they write buffers in some small batches.
Otherwise,
> they uses asynchronous and direct I/O options. Therefore, dirty
pages in
> kernel buffers are keeped to be low at any time.

Oracle seems to use O_SYNC. I've found a related page in the Oracle
manuals.

--------------------------------------------------
http://download-west.oracle.com/docs/cd/B19306_01/win.102/b15688/ap_unix.htm
Direct Writes to Disk
On both UNIX and Windows platforms, bypassing the file system buffer
cache ensures data is written to disk.
On UNIX, Oracle Database uses the O_SYNC flag to bypass the file
system buffer cache. The flag name depends on the UNIX port.
On Windows, Oracle Database bypasses the file system buffer cache
completely.
--------------------------------------------------

As Itagaki-san says, asynchronous+direct I/O provides best
performance, I believe. Oracle supplies the combination as follows:

--------------------------------------------------
http://download-west.oracle.com/docs/cd/B19306_01/server.102/b15658/appc_linux.htm#sthref870
Oracle Database supports kernel asynchronous I/O. This feature is
disabled by default.
By default, the DISK_ASYNCH_IO initialization parameter in the
parameter file is set to TRUE to enable asynchronous I/O on raw
devices. To enable asynchronous I/O on file system files:
Ensure that all Oracle Database files are located on file systems that
support asynchronous I/O.
Set the FILESYSTEMIO_OPTIONS initialization parameter in the parameter
file to one of the following values:

Linux Distribution Recommended Value
SUSE Linux Enterprise Server 9 SETALL
Other distributions ASYNCH
--------------------------------------------------

I believe SQL Server also uses direct+asynchronous I/O, because
Microsoft recommends in MSDN that the combination plus appropriate
multi-threading provides best performance.

I tested Oracle9i on RHEL 2.1. I straced DBWn (database writer, which
is like the bgwriter of PostgreSQL) while creating tables, indexes,
etc. and shutting down the database server. Oracle surely uses the
O_SYNC as follows, but it doesn't use fsync().

24462 open("/work4/ora/tuna/users01.dbf", O_RDWR|O_SYNC|O_LARGEFILE) =
16

I wonder how the other big DBMS, IBM DB2, handles this. Is Itagaki-san
referring to DB2?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Takayuki Tsunakawa 2006-12-11 12:22:44 Re: Load distributed checkpoint
Previous Message Richard Huxton 2006-12-11 12:05:37 Re: EXPLAIN ANALYZE

Browse pgsql-patches by date

  From Date Subject
Next Message Takayuki Tsunakawa 2006-12-11 12:22:44 Re: Load distributed checkpoint
Previous Message ITAGAKI Takahiro 2006-12-11 10:38:16 Re: Load distributed checkpoint