Re: Spread checkpoint sync

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Spread checkpoint sync
Date: 2010-12-05 05:56:51
Message-ID: 4CFB29A3.1060002@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark wrote:
> Using sync_file_range you can specify the set of blocks to sync and
> then block on them only after some time has passed. But there's no
> documentation on how this relates to the I/O scheduler so it's not
> clear it would have any effect on the problem.

I believe this is the exact spot we're stalled at in regards to getting
this improved on the Linux side, as I understand it at least. *The*
answer for this class of problem on Linux is to use sync_file_range, and
I don't think we'll ever get any sympathy from those kernel developers
until we do. But that's a Linux specific call, so doing that is going
to add a write path fork with platform-specific code into the database.
If I thought sync_file_range was a silver bullet guaranteed to make this
better, maybe I'd go for that. I think there's some relatively
low-hanging fruit on the database side that would do better before going
to that extreme though, thus the patch.

> We might still have to delay the begining of the sync to allow the dirty blocks to be synced
> naturally and then when we issue it still end up catching a lot of
> other i/o as well.
>

Whether it's "lots" or not is really workload dependent. I work from
the assumption that the blocks being written out by the checkpoint are
the most popular ones in the database, the ones that accumulate a high
usage count and stay there. If that's true, my guess is that the writes
being done while the checkpoint is executing are a bit less likely to be
touching the same files. You raise a valid concern, I just haven't seen
that actually happen in practice yet.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message flyusa2010 fly 2010-12-05 06:30:35 Re: disk caching for writing log
Previous Message Greg Smith 2010-12-05 05:12:19 Re: Re: Proposed Windows-specific change: Enable crash dumps (like core files)