From: | ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org |
Subject: | WAL: O_DIRECT and multipage-writer |
Date: | 2005-01-25 09:06:23 |
Message-ID: | 20050125164005.BC8A.ITAGAKI.TAKAHIRO@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Hello, all.
I think that there is room for improvement in WAL.
Here is a patch for it.
- Multiple pages are written in one write() if it is contiguous.
- Add 'open_direct' to wal_sync_method.
WAL writer writes one page in one write(). This is not efficient
when wal_sync_method is 'open_sync', because the writer waits for
IO completions at each write(). Multipage-writer can reduce syscalls
and improve IO throughput.
'open_direct' uses O_DIRECT instead of O_SYNC. O_DIRECT implies synchronous
writing, so it may show the tendency like open_sync. But maybe it can reduce
memcpy() and save OS's disk cache memory.
I benchmarked this patch with pgbench. It works well and
improved 50% of tps on my machine. WAL seems to be bottle-neck
on machines with poor disks.
This patch has not yet tested enough. I would like it to be examined much
and taken into PostgreSQL.
There are still many TODOs:
* Is this logic really correct?
- O_DIRECT_BUFFER_ALIGN should be adjusted to runtime, not compile time.
- Consider to use writev() instead of write().
Buffers are noncontiguous when WAL ring buffer rotates.
- If wan_sync_method is not open_direct, XLOG_EXTRA_BUFFERS can be 0.
Sincerely,
ITAGAKI Takahiro
-- pgbench result --
$ ./pgbench -s 100 -c 50 -t 400
- 8.0.0 default + fsync:
tps = 20.630632 (including connections establishing)
tps = 20.636768 (excluding connections establishing)
- multipage-writer + open_direct:
tps = 33.761917 (including connections establishing)
tps = 33.778320 (excluding connections establishing)
Environment:
OS : Linux kernel 2.6.9
CPU : Pentium 4 3GHz
disk : ATA 5400rpm (Data and WAL are placed on same partition.)
memory : 1GB
config : shared_buffers=10000, wal_buffers=256,
XLOG_SEG_SIZE=256MB, checkpoint_segment=4
---
ITAGAKI Takahiro <itagaki(dot)takahiro(at)lab(dot)ntt(dot)co(dot)jp>
NTT Cyber Space Laboratories
Nippon Telegraph and Telephone Corporation.
Attachment | Content-Type | Size |
---|---|---|
xlog.diff | application/octet-stream | 5.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pailloncy Jean-Gerard | 2005-01-25 09:25:08 | Re: Concurrent free-lock |
Previous Message | Tom Lane | 2005-01-25 08:20:02 | Re: bug w/ cursors and savepoints |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paesold | 2005-01-25 09:30:01 | Re: WAL: O_DIRECT and multipage-writer |
Previous Message | Harald Armin Massa | 2005-01-25 08:34:35 | Re: pg_autovacuum Win32 Service startup delay |