Re: double-buffering page writes

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: double-buffering page writes
Date: 2008-10-23 16:41:44
Message-ID: 4900A948.1000606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera wrote:
> ITAGAKI Takahiro wrote:
>
>> I have some comments about the double-buffering:
>
> Since posting this patch I have realized that this implementation is
> bogus. I'm now playing with WAL-logging hint bits though.

Yeah, the torn page + hint bit updates problem is the tough question.

>> - Is it ok to allocale dblbuf[BLCKSZ] as local variable?
>> It might be unaligned. AFAICS we avoid such usages in other places.
>
> I thought about that too. I admit I am not sure if this really works
> portably; however I don't want to add a palloc() to that routine.

It should work, AFAIK, but unaligned memcpy()s and write()s can be a
significantly slower. There can be only one write() happening at any
time, so you could just palloc() a single 8k buffer in TopMemoryContext
in backend startup, and always use that.

>> - Are there any other modules that can share in the benefits of
>> double-buffering? For example, we could avoid avoid waiting for
>> LockBufferForCleanup(). It is cool if the double-buffering can
>> be used for multiple purposes.
>
> Not sure on this.

You'd need to keep both versions of the buffer simultaneously in the
buffer cache for that. When we talked about the various designs for HOT,
that was one of the ideas I had to enable more aggressive pruning: if
you can't immediately get a vacuum lock, allocate a new buffer in the
buffer cache for the same block, copy the page to the new buffer, and do
the pruning, including moving tuples around, there. Any new ReadBuffer
calls would return the new page version, but old readers would keep
referencing the old one. The intrusive part of that approach, in
addition to the obvious changes required in the buffer manager to keep
around multiple copies of the same block, is that all modifications must
be done on the new version, so anyone who needs to lock the page for
modification would need to switch to the new page version at the
LockBuffer call.

As discussed in the other thread with Simon, we also use vacuum locks in
b-tree for waiting out index scans, so avoiding the waiting there would
be just wrong.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-10-23 16:43:59 Re: SSL cleanups/hostname verification
Previous Message Andrew Sullivan 2008-10-23 16:21:49 Re: Unicode escapes in literals