Re: checkpoint writeback via sync_file_range

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Florian Weimer <fweimer(at)bfk(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: checkpoint writeback via sync_file_range
Date: 2012-01-11 12:51:38
Message-ID: 201201111351.38738.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, January 11, 2012 10:33:47 AM Florian Weimer wrote:
> * Greg Smith:
> > One idea I was thinking about here was building a little hash table
> > inside of the fsync absorb code, tracking how many absorb operations
> > have happened for whatever the most popular relation files are. The
> > idea is that we might say "use sync_file_range every time <N> calls
> > for a relation have come in", just to keep from ever accumulating too
> > many writes to any one file before trying to nudge some of it out of
> > there. The bat that keeps hitting me in the head here is that right
> > now, a single fsync might have a full 1GB of writes to flush out,
> > perhaps because it extended a table and then write more than that to
> > it. And in everything but a SSD or giant SAN cache situation, 1GB of
> > I/O is just too much to fsync at a time without the OS choking a
> > little on it.
>
> Isn't this pretty much like tuning vm.dirty_bytes? We generally set it
> to pretty low values, and seems to help to smoothen the checkpoints.
If done correctly/way much more invasive you could only issue sync_file_range's
to the areas of the file where checkpointing needs to happen and you could
leave out e.g. hint bit only changes. Which could help to reduce the cost of
checkpoints.

Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-01-11 13:10:17 Re: JSON for PG 9.2
Previous Message Andres Freund 2012-01-11 12:47:39 Re: checkpoint writeback via sync_file_range