Re: Spreading full-page writes

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Spreading full-page writes
Date: 2014-05-28 06:47:33
Message-ID: CA+U5nMKaYv3dyH1QUrw67SYBKRAsdOSDbAzekg6BzYg_1-pbyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27 May 2014 18:18, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Mon, May 26, 2014 at 8:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Mon, May 26, 2014 at 1:22 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>> >>I don't think we know that. The server might have crashed before that
>> >>second record got generated. (This appears to be an unfixable flaw in
>> >>this proposal.)
>> >
>> > The second record is generated before the checkpoint is finished and the
>> > checkpoint record is written. So it will be there.
>> >
>> > (if you crash before the checkpoint is finished, the in-progress
>> > checkpoint is no good for recovery anyway, and won't be used)
>>
>> Hmm, I see.
>>
>> It's not great to have to generate WAL at buffer-eviction time,
>> though. Normally, when we go to evict a buffer, the WAL is already
>> written. We might have to wait for it to be flushed, but if the WAL
>> writer is doing its job, hopefully not. But here we'll definitely
>> have to wait for the WAL flush.
>
>
> I'm not sure we do need to flush it. If the checkpoint finishes, then the
> WAL surely got flushed as part of the process of recording the end of the
> checkpoint. If the checkpoint does not finish, recovery will start from the
> previous checkpoint, which does contain the FPW (because if it didn't, the
> page would not be eligible for this treatment) and so the possibly torn page
> will get overwritten in full.

I think Robert is correct, you would need to flush WAL before writing
the disk buffer. That is the current invariant of WAL before data.

However, we don't need to do this in a simple way: FPW-flush-buffer,
we can do that with more buffering.

So it seems like a reasonable idea to do this using a 64 buffer
BulkAccessStrategy object and flush the WAL every 64 buffers. That's
beginning to look more like double buffering though...

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-05-28 07:11:09 Re: Spreading full-page writes
Previous Message Simon Riggs 2014-05-28 06:41:06 Re: Spreading full-page writes