Re: Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: mudfoot(at)rawbw(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options
Date: 2004-09-13 14:38:08
Message-ID: 200409131438.i8DEc8r04384@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Have you seen /src/tools/fsync?

---------------------------------------------------------------------------

mudfoot(at)rawbw(dot)com wrote:
> Hi, I'd like to help with the topic in the Subject: line. It seems to be a
> TODO item. I've reviewed some threads discussing the matter, so I hope I've
> acquired enough history concerning it. I've taken an initial swipe at
> figuring out how to optimize sync'ing methods. It's based largely on
> recommendations I've read on previous threads about fsync/O_SYNC and so on.
> After reviewing, if anybody has recommendations on how to proceed then I'd
> love to hear them.
>
> Attached is a little program that basically does a bunch of sequential writes
> to a file. All of the sync'ing methods supported by PostgreSQL WAL can be
> used. Results are printed in microseconds. Size and quanity of writes are
> configurable. The documentation is in the code (how to configure, build, run,
> etc.). I realize that this program doesn't reflect all of the possible
> activities of a production database system, but I hope it's a step in the
> right direction for this task. I've used it to see differences in behavior
> between the various sync'ing methods on various platforms.
>
> Here's what I've found running the benchmark on some systems to which
> I have access. The differences in behavior between platforms is quite vast.
>
> Summary first...
>
> <halfjoke>
> PostgreSQL should be run on an old Apple MacIntosh attached to
> its own Hitachi disk array with 2GB cache or so. Use any sync method
> except for fsync().
> </halfjoke>
>
> Anyway, there is *a lot* of variance in file synching behavior across
> different hardware and O/S platforms. It's probably not safe
> to conclude much. That said, here are some findings so far based on
> tests I've run:
>
> 1. under no circumstances do fsync() or fdatasync() seem to perform
> better than opening files with O_SYNC or O_DSYNC
> 2. where there are differences, opening files with O_SYNC or O_DSYNC
> tends to be quite faster.
> 3. fsync() seems to be the slowest where there are differences. And
> O_DSYNC seems to be the fastest where results differ.
> 4. the safest thing to assert at this point is that
> Solaris systems ought to use the O_DSYNC method for WAL.
>
> -----------
>
> Test system(s)
>
> Athlon Linux:
> AMD Athlon XP2000, 512MB RAM, single (54 or 7200?) RPM 20GB IDE disk,
> reiserfs filesystem (3 something I think)
> SuSE Linux kernel 2.4.21-99
>
> Mac Linux:
> I don't know the specific model. 400MHz G3, 512MB, single IDE disk,
> ext2 filesystem
> Debian GNU/Linux 2.4.16-powerpc
>
> HP Intel Linux:
> Prolient HPDL380G3, 2 x 3GHz Xeon, 2GB RAM, SmartArray 5i 64MB cache,
> 2 x 15,000RPM 36GB U320 SCSI drives mirrored. I'm not sure if
> writes are cached or not. There's no battery backup.
> ext3 filesystem.
> Redhat Enterprise Linux 3.0 kernel based on 2.4.21
>
> Dell Intel OpenBSD:
> Poweredge ?, single 1GHz PIII, 128MB RAM, single 7200RPM 80GB IDE disk,
> ffs filesystem
> OpenBSD 3.2 GENERIC kernel
>
> SUN Ultra2:
> Ultra2, 2 x 296MHz UltraSPARC II, 2GB RAM, 2 x 10,000RPM 18GB U160
> SCSI drives mirrored with Solstice DiskSuite. UFS filesystem.
> Solaris 8.
>
> SUN E4500 + HDS Thunder 9570v
> E4500, 8 x 400MHz UltraSPARC II, 3GB RAM,
> HDS Thunder 9570v, 2GB mirrored battery-backed cache, RAID5 with a
> bunch of 146GB 10,000RPM FC drives. LUN is on single 2GB FC fabric
> connection.
> Veritas filesystem (VxFS)
> Solaris 8.
>
> Test methodology:
>
> All test runs were done with CHUNKSIZE 8 * 1024, CHUNKS 2 * 1024,
> FILESIZE_MULTIPLIER 2, and SLEEP 5. So a total of 16MB was sequentially
> written for each benchmark.
>
> Results are in microseconds.
>
> PLATFORM: Athlon Linux
> buffered: 48220
> fsync: 74854397
> fdatasync: 75061357
> open_sync: 73869239
> open_datasync: 74748145
> Notes: System mostly idle. Even during tests, top showed about 95%
> idle. Something's not right on this box. All sync methods similarly
> horrible on this system.
>
> PLATFORM: Mac Linux
> buffered: 58912
> fsync: 1539079
> fdatasync: 769058
> open_sync: 767094
> open_datasync: 763074
> Notes: system mostly idle. fsync seems worst. Otherwise, they seem
> pretty equivalent. This is the fastest system tested.
>
> PLATFORM: HP Intel Linux
> buffered: 33026
> fsync: 29330067
> fdatasync: 28673880
> open_sync: 8783417
> open_datasync: 8747971
> Notes: system idle. O_SYNC and O_DSYNC methods seem to be a lot
> better on this platform than fsync & fdatasync.
>
> PLATFORM: Dell Intel OpenBSD
> buffered: 511890
> fsync: 1769190
> fdatasync: --------
> open_sync: 1748764
> open_datasync: 1747433
> Notes: system idle. I couldn't locate fdatasync() on this box, so I
> couldn't test it. All sync methods seem equivalent and are very fast --
> though still trail the old Mac.
>
> PLATFORM: SUN Ultra2
> buffered: 1814824
> fsync: 73954800
> fdatasync: 52594532
> open_sync: 34405585
> open_datasync: 13883758
> Notes: system mostly idle, with occasional spikes from 1-10% utilization.
> It looks like substantial difference between each sync method, with
> O_DSYNC the best and fsync() the worst. There is substantial
> difference between the open* and f* methods.
>
> PLATFORM: SUN E4500 + HDS Thunder 9570v
> buffered: 233947
> fsync: 57802065
> fdatasync: 56631013
> open_sync: 2362207
> open_datasync: 1976057
> Notes: host about 30% idle, but the array tested on was completely idle.
> Something looks seriously not right about fsync and fdatasync -- write
> cache seems to have no effect on them. As for write cache, that
> probably explains the 2 seconds or so for the open_sync and
> open_datasync methods.
>
> --------------
>
> Thanks for reading...I look forward to feedback, and hope to be helpful in
> this effort!
>
> Mark
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bill Fefferman 2004-09-13 15:48:46 tblspace
Previous Message Pierre-Frédéric Caillaud 2004-09-13 14:00:35 Re: Help with extracting large volumes of records across related tables