Re: O_DSYNC broken on MacOS X?

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: "A(dot)M(dot)" <agentm(at)themactionfaction(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O_DSYNC broken on MacOS X?
Date: 2010-10-07 23:29:10
Message-ID: 4CAE57C6.40400@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A.M. wrote:
> Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt the DBA to check that the numbers are within the realm of possibility.
>

This is what the test_fsync utility that already ships with the database
should be useful for. The way Bruce changed it to report numbers in
commits/second for 9.0 makes it a lot easier to use for this purpose
than it used to be. I think there's still some additional improvements
that could be made there, but it's a tricky test to run accurately. The
current code is probably too detailed in some ways (it delivers a lot of
output not relevant to this use-case) and not detailed enough in
others. Providing a summary that understands things like
fsync_writethrough on platforms that support it was the first
refactoring I had in my mind. If that thing came back and said
"fsync_writethrough works for you, so don't even consider the other
possibilities if you want reliability even though they are faster", that
would be nice for example.

> How else can a DBA today ensure that a commit is a commit?
>

You can't ensure a commit is a commit without running a pull the plug
test. And I think the best way to do that accurately is using a "remote
witness" server focusing on finding this particular problem to look for
glitches, rather than than using the database as your test program and
seeing if you happen to hit corruption or not. The documentation for
9.0 now suggests running the diskchecker.pl program for this exact
purpose. I've seen enough reports of it finding even subtle cache loss
situations to believe that encouraging heavier use of that would be
enough to make people much safer than they typically are today. What we
probably need to do next is provide people with an exact walkthrough of
setting up and using the program, showing what a passing result looks
like, and what a failing one looks like.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2010-10-07 23:43:20 column-level update privs + lock table
Previous Message Tom Lane 2010-10-07 23:20:46 Re: I: About "Our CLUSTER implementation is pessimal" patch