Re: sync_file_range()

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sync_file_range()
Date: 2006-06-19 19:04:39
Message-ID: 8764ix55mg.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:

> On Mon, 2006-06-19 at 15:32 +0800, Qingqing Zhou wrote:
> > "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote
> > >
> > >
> > > I'm interested in it, with which we could improve responsiveness during
> > > checkpoints. Though it is Linux specific system call, but we could use
> > > the combination of mmap() and msync() instead of it; I mean we can use
> > > mmap only to flush dirty pages, not to read or write pages.
> > >
> >
> > Can you specify details? As the TODO item inidcates, if we mmap data file, a
> > serious problem is that we don't know when the data pages hit the disks --
> > so that we may voilate the WAL rule.
>
> Can't see where we'd use it.
>
> We fsync the xlog at transaction commit, so only the leading edge needs
> to be synced - would the call help there? Presumably the OS can already
> locate all blocks associated with a particular file fairly quickly
> without doing a full cache scan.

Well in theory the transaction being committed isn't necessarily the "leading
edge", there could be more work from other transactions since the last work
this transaction actually did. However I can't see that actually helping
performance much if at all. There can't be much, and writing the data it
doesn't really matter much how much data it writes -- what really matters is
rotational and seek latency anyways.

> Other files are fsynced at checkpoint - always all dirty blocks in the
> whole file.

Well couldn't it be useful for checkpoints if it there was some way to know
which buffers had been touched since the last checkpoint? There could be a lot
of buffers dirtied since the checkpoint began and those don't really need to
be synced do they?

Or it could be used to control the rate at which the files are checkpointed.

Come to think of it I wonder whether there's anything to be gained by using
smaller files for tables. Instead of 1G files maybe 256M files or something
like that to reduce the hit of fsyncing a file.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-06-19 19:53:32 Re: sync_file_range()
Previous Message Jim C. Nasby 2006-06-19 17:45:13 Re: Getting rid of extra gettimeofday() calls