Re: Double-writes, take two?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Double-writes, take two?
Date: 2018-04-19 22:28:01
Message-ID: CA+TgmoZouHJ5+BvvhgqOUcuNT22Kp6LBPFt5whp0_KgCSe3nnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 18, 2018 at 2:22 AM, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> I was thinking about this problem, and it looks that one approach for
> double-writes would be to introduce it as a secondary WAL stream
> independent from the main one:
> - Once a buffer is evicted from shared buffers and is dirty, write it to
> double-write stream and to the data file, and only sync it to the
> double-write stream.
> - At recovery, replay the WAL stream for double-writes first.

I don't really think that this can work. If we're in archive recovery
(i.e. recovery of *indefinite* duration), what does it mean to replay
the double-writes "first"?

What I think probably needs to happen instead is that the secondary
WAL stream contains a bunch of records of the form < LSN, block ID,
page image >. When recovery replays the WAL record for an LSN, it
also restores any double-write images for that LSN. So in effect that
WAL format stays the way it is now, but the full page images are moved
out of line.

If this is all done right, the standby should be able to regenerate
the double-write stream without receiving it from the master. That
would be good, because then the volume of WAL from master to standby
would drop by a large amount.

However, it's hard to see how this would perform well. The
double-write stream would have to obey the WAL-before-data rule; that
is, every eviction from shared buffers would have to flush the WAL
*and the double-write buffer*. Unless we're running on hardware where
fsync() is very cheap, such as NVRAM, that increase in the total
number of fsyncs is probably going to pinch. You'd probably want to
have a dwbuf_writer process like wal_writer so that the fsyncs can be
issued concurrently, but I suspect that the filesystem will execute
them sequentially anyway, hence the pinch.

I think this is an interesting topic, but I don't plan to work on it
because I have no confidence that I could do it well enough to come
out ahead vs. the status quo.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-04-19 22:58:00 Re: Built-in connection pooling
Previous Message Alvaro Herrera 2018-04-19 22:02:12 Re: Event trigger bugs (was Re: Repeated crashes in GENERATED ... AS IDENTITY tests)