Re: WAL: O_DIRECT and multipage-writer

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: WAL: O_DIRECT and multipage-writer
Date: 2005-02-14 23:25:04
Message-ID: 200502142325.j1ENP4e19810@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


This thread has been saved for the 8.1 release:

http://momjian.postgresql.org/cgi-bin/pgpatches2

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> Hello, all.
>
> I think that there is room for improvement in WAL.
> Here is a patch for it.
> - Multiple pages are written in one write() if it is contiguous.
> - Add 'open_direct' to wal_sync_method.
>
> WAL writer writes one page in one write(). This is not efficient
> when wal_sync_method is 'open_sync', because the writer waits for
> IO completions at each write(). Multipage-writer can reduce syscalls
> and improve IO throughput.
>
> 'open_direct' uses O_DIRECT instead of O_SYNC. O_DIRECT implies synchronous
> writing, so it may show the tendency like open_sync. But maybe it can reduce
> memcpy() and save OS's disk cache memory.
>
> I benchmarked this patch with pgbench. It works well and
> improved 50% of tps on my machine. WAL seems to be bottle-neck
> on machines with poor disks.
>
> This patch has not yet tested enough. I would like it to be examined much
> and taken into PostgreSQL.
>
> There are still many TODOs:
> * Is this logic really correct?
> - O_DIRECT_BUFFER_ALIGN should be adjusted to runtime, not compile time.
> - Consider to use writev() instead of write().
> Buffers are noncontiguous when WAL ring buffer rotates.
> - If wan_sync_method is not open_direct, XLOG_EXTRA_BUFFERS can be 0.
>
>
> Sincerely,
> ITAGAKI Takahiro
>
>
>
> -- pgbench result --
>
> $ ./pgbench -s 100 -c 50 -t 400
>
> - 8.0.0 default + fsync:
> tps = 20.630632 (including connections establishing)
> tps = 20.636768 (excluding connections establishing)
> - multipage-writer + open_direct:
> tps = 33.761917 (including connections establishing)
> tps = 33.778320 (excluding connections establishing)
>
> Environment:
> OS : Linux kernel 2.6.9
> CPU : Pentium 4 3GHz
> disk : ATA 5400rpm (Data and WAL are placed on same partition.)
> memory : 1GB
> config : shared_buffers=10000, wal_buffers=256,
> XLOG_SEG_SIZE=256MB, checkpoint_segment=4
>
> ---
> ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
> NTT Cyber Space Laboratories
> Nippon Telegraph and Telephone Corporation.

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message pgsql 2005-02-15 01:05:01 Re: 8.0.X and the ARC patent
Previous Message Bruce Momjian 2005-02-14 23:17:48 8.0.X and the ARC patent

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Eisentraut 2005-02-14 23:36:52 Re: Cleanup for gettext() calls
Previous Message Bruce Momjian 2005-02-14 23:04:02 Cleanup for gettext() calls