Skip site navigation (1) Skip section navigation (2)

Re: Experimental patch for inter-page delay in VACUUM

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ang Chin Han <angch(at)bytecraft(dot)com(dot)my>,Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Experimental patch for inter-page delay in VACUUM
Date: 2003-11-04 16:28:10
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Tom Lane wrote:

> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> Tom Lane wrote:
>>> I have never been happy with the fact that we use sync(2) at all.
>> Sure does it do too much. But together with the other layer of 
>> indirection, the virtual file descriptor pool, what is the exact 
>> guaranteed behaviour of
>>      write(); close(); open(); fsync();
>> cross platform?
> That isn't guaranteed, which is why we have to use sync() at the
> moment.  To go over to fsync or O_SYNC we'd need more control over which
> file descriptors are used to issue writes.  Which is why I was thinking
> about moving the writes to a centralized writer process.
>>> Actually, once you build it this way, you could make all writes
>>> synchronous (open the files O_SYNC) so that there is never any need for
>>> explicit fsync at checkpoint time.
>> Yes, but then the configuration leans more towards "take over the RAM" 
> Why?  The idea is to try to issue writes at a fairly steady rate, which
> strikes me as much better than the current behavior.  I don't see why it
> would force you to have large numbers of buffers available.  You'd want
> a few thousand, no doubt, but that's not a large number.

That is part of the idea. The whole idea is to issue "physical" writes 
at a fairly steady rate without increasing the number of them 
substantial or interfering with the drives opinion about their order too 
much. I think O_SYNC for random access can be in conflict with write 

How I can see the background writer operating is that he's keeping the 
buffers in the order of the LRU chain(s) clean, because those are the 
buffers that most likely get replaced soon. In my experimental ARC code 
it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and 
n2 dirty buffers (n1+n2 configurable), then fsync all files that have 
been involved in that, nap depending on where he got down the queues (to 
increase the write rate when running low on clean buffers), and do it 
all over again.

That way, everyone else doing a write must issue an fsync too because 
it's not guaranteed that the fsync of one process flushes the writes of 
another. But as you said, if that is a relatively seldom operation for a 
regular backend, it won't hurt.


# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to


pgsql-hackers by date

Next:From: Matthew T. O'ConnorDate: 2003-11-04 16:30:40
Subject: Re: Experimental patch for inter-page delay in VACUUM
Previous:From: Tom LaneDate: 2003-11-04 15:58:46
Subject: Re: Experimental patch for inter-page delay in VACUUM

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group