Re: Experimental patch for inter-page delay in VACUUM

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ang Chin Han <angch(at)bytecraft(dot)com(dot)my>, Christopher Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Experimental patch for inter-page delay in VACUUM
Date: 2003-11-04 16:28:10
Message-ID: 3FA7D39A.8060502@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> Tom Lane wrote:
>>> I have never been happy with the fact that we use sync(2) at all.
>
>> Sure does it do too much. But together with the other layer of
>> indirection, the virtual file descriptor pool, what is the exact
>> guaranteed behaviour of
>> write(); close(); open(); fsync();
>> cross platform?
>
> That isn't guaranteed, which is why we have to use sync() at the
> moment. To go over to fsync or O_SYNC we'd need more control over which
> file descriptors are used to issue writes. Which is why I was thinking
> about moving the writes to a centralized writer process.
>
>>> Actually, once you build it this way, you could make all writes
>>> synchronous (open the files O_SYNC) so that there is never any need for
>>> explicit fsync at checkpoint time.
>
>> Yes, but then the configuration leans more towards "take over the RAM"
>
> Why? The idea is to try to issue writes at a fairly steady rate, which
> strikes me as much better than the current behavior. I don't see why it
> would force you to have large numbers of buffers available. You'd want
> a few thousand, no doubt, but that's not a large number.

That is part of the idea. The whole idea is to issue "physical" writes
at a fairly steady rate without increasing the number of them
substantial or interfering with the drives opinion about their order too
much. I think O_SYNC for random access can be in conflict with write
reordering.

How I can see the background writer operating is that he's keeping the
buffers in the order of the LRU chain(s) clean, because those are the
buffers that most likely get replaced soon. In my experimental ARC code
it would traverse the T1 and T2 queues from LRU to MRU, write out n1 and
n2 dirty buffers (n1+n2 configurable), then fsync all files that have
been involved in that, nap depending on where he got down the queues (to
increase the write rate when running low on clean buffers), and do it
all over again.

That way, everyone else doing a write must issue an fsync too because
it's not guaranteed that the fsync of one process flushes the writes of
another. But as you said, if that is a relatively seldom operation for a
regular backend, it won't hurt.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthew T. O'Connor 2003-11-04 16:30:40 Re: Experimental patch for inter-page delay in VACUUM
Previous Message Tom Lane 2003-11-04 15:58:46 Re: Experimental patch for inter-page delay in VACUUM