| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Cc: | Florian Weimer <fweimer(at)bfk(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
| Subject: | Re: checkpoint writeback via sync_file_range |
| Date: | 2012-01-11 12:51:38 |
| Message-ID: | 201201111351.38738.andres@anarazel.de |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wednesday, January 11, 2012 10:33:47 AM Florian Weimer wrote:
> * Greg Smith:
> > One idea I was thinking about here was building a little hash table
> > inside of the fsync absorb code, tracking how many absorb operations
> > have happened for whatever the most popular relation files are. The
> > idea is that we might say "use sync_file_range every time <N> calls
> > for a relation have come in", just to keep from ever accumulating too
> > many writes to any one file before trying to nudge some of it out of
> > there. The bat that keeps hitting me in the head here is that right
> > now, a single fsync might have a full 1GB of writes to flush out,
> > perhaps because it extended a table and then write more than that to
> > it. And in everything but a SSD or giant SAN cache situation, 1GB of
> > I/O is just too much to fsync at a time without the OS choking a
> > little on it.
>
> Isn't this pretty much like tuning vm.dirty_bytes? We generally set it
> to pretty low values, and seems to help to smoothen the checkpoints.
If done correctly/way much more invasive you could only issue sync_file_range's
to the areas of the file where checkpointing needs to happen and you could
leave out e.g. hint bit only changes. Which could help to reduce the cost of
checkpoints.
Andres
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2012-01-11 13:10:17 | Re: JSON for PG 9.2 |
| Previous Message | Andres Freund | 2012-01-11 12:47:39 | Re: checkpoint writeback via sync_file_range |