Skip site navigation (1) Skip section navigation (2)

Re: synchronous commit vs. hint bits

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, simon(at)2ndquadrant(dot)com
Subject: Re: synchronous commit vs. hint bits
Date: 2011-12-01 16:47:52
Message-ID: CA+TgmoZZrLz9TaZs0kG74o3S3YU_XJNOW5dkq5gakQZ=4RmiMA@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Thu, Dec 1, 2011 at 9:58 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> Waiting until the other one completes is how it currently is
> implemented, but is it necessary from a correctness view?  It seems
> like the WALWriteLock only needs to protect the write, and not the
> sync (assuming the sync method allows those to be separate actions),
> and that there could be multiple fsync requests from different
> processes pending at the same time without a correctness problem.

I've wondered about that, too.  At least on Linux, the overhead of a
system call seems to be pretty low - e.g. the ridiculous number of
lseek calls we do on a pgbench -S doesn't seem create much overhead
until the inode mutex starts to become contended; and that problem
should be fixed in Linux 3.2.  But I'm not sure if system calls are
similarly cheap on all platforms, or even if it's true on Linux for
fsync() in particular.

There's another possible approach here, too: instead of waiting to set
hint bits until the commit record hits the disk, we could allow the
hint bits to set immediately on the condition that we don't write it
out until the commit record hits the disk.  Bumping the page LSN would
do that, but I think that might be problematic since setting hint bits
isn't WAL-logged.  If so, we could possibly fix that by storing a
second LSN for the page out of line, e.g. in the buffer descriptor.
That might be even faster than speeding up the WAL flush.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

pgsql-hackers by date

Next:From: Merlin MoncureDate: 2011-12-01 17:00:00
Subject: Re: Add minor version to v3 protocol to allow changes without breaking backwards compatibility
Previous:From: Peter GeogheganDate: 2011-12-01 16:44:55
Subject: Re: Inlining comparators as a performance optimisation

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group