From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Florian Weimer <fweimer(at)bfk(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: checkpoint writeback via sync_file_range |
Date: | 2012-01-11 12:51:38 |
Message-ID: | 201201111351.38738.andres@anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wednesday, January 11, 2012 10:33:47 AM Florian Weimer wrote:
> * Greg Smith:
> > One idea I was thinking about here was building a little hash table
> > inside of the fsync absorb code, tracking how many absorb operations
> > have happened for whatever the most popular relation files are. The
> > idea is that we might say "use sync_file_range every time <N> calls
> > for a relation have come in", just to keep from ever accumulating too
> > many writes to any one file before trying to nudge some of it out of
> > there. The bat that keeps hitting me in the head here is that right
> > now, a single fsync might have a full 1GB of writes to flush out,
> > perhaps because it extended a table and then write more than that to
> > it. And in everything but a SSD or giant SAN cache situation, 1GB of
> > I/O is just too much to fsync at a time without the OS choking a
> > little on it.
>
> Isn't this pretty much like tuning vm.dirty_bytes? We generally set it
> to pretty low values, and seems to help to smoothen the checkpoints.
If done correctly/way much more invasive you could only issue sync_file_range's
to the areas of the file where checkpointing needs to happen and you could
leave out e.g. hint bit only changes. Which could help to reduce the cost of
checkpoints.
Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2012-01-11 13:10:17 | Re: JSON for PG 9.2 |
Previous Message | Andres Freund | 2012-01-11 12:47:39 | Re: checkpoint writeback via sync_file_range |