Re: Weird XFS WAL problem

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Craig James <craig_james(at)emolecules(dot)com>, Matthew Wakeling <matthew(at)flymine(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Weird XFS WAL problem
Date: 2010-07-07 14:42:55
Message-ID: 201007071442.o67EgtU26973@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg Smith wrote:
> Kevin Grittner wrote:
> > I don't know at the protocol level; I just know that write barriers
> > do *something* which causes our controllers to wait for actual disk
> > platter persistence, while fsync does not
>
> It's in the docs now:
> http://www.postgresql.org/docs/9.0/static/wal-reliability.html
>
> FLUSH CACHE EXT is the ATAPI-6 call that filesystems use to enforce
> barriers on that type of drive. Here's what the relevant portion of the
> ATAPI spec says:
>
> "This command is used by the host to request the device to flush the
> write cache. If there is data in the write
> cache, that data shall be written to the media.The BSY bit shall remain
> set to one until all data has been
> successfully written or an error occurs."
>
> SAS systems have a similar call named SYNCHRONIZE CACHE.
>
> The improvement I actually expect to arrive here first is a reliable
> implementation of O_SYNC/O_DSYNC writes. Both SAS and SATA drives that
> capable of doing Native Command Queueing support a write type called
> "Force Unit Access", which is essentially just like a direct write that
> cannot be cached. When we get more kernels with reliable sync writing
> that maps under the hood to FUA, and can change wal_sync_method to use
> them, the need to constantly call fsync for every write to the WAL will
> go away. Then the "blow out the RAID cache when barriers are on"
> behavior will only show up during checkpoint fsyncs, which will make
> things a lot better (albeit still not ideal).

Great information! I have added the attached documentation patch to
explain the write-barrier/BBU interaction. This will appear in the 9.0
documentation.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 4.4 KB

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Richard Yen 2010-07-07 16:39:11 Re: [Slony1-general] WAL partition overloaded--by autovacuum?
Previous Message damien hostin 2010-07-07 14:39:38 Re: Slow query with planner row strange estimation