Re: Proposal for 9.1: WAL streaming from WAL buffers

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal for 9.1: WAL streaming from WAL buffers
Date: 2010-07-06 23:44:24
Message-ID: AANLkTincoP4sYkLhtRr7iQ-js7ilvyogThfHyElTlX1U@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
> That is, we cannot send WAL until it has been written (and flushed) to the disk.
> This degrades the performance of synchronous replication very much since a
> transaction commit must wait for the WAL write time *plus* the replication time.
>
> The attached patch enables walsender to read data from WAL buffers in addition
> to the disk. Since we can write and send WAL simultaneously, in synchronous
> replication, a transaction commit has only to wait for either of them. So the
> performance would significantly increase.

To recap the previous discussion on this thread, we ended up changing
the behavior of 9.0 so that it only sends WAL which has been written
to the OS *and flushed*, because sending unflushed WAL to the standby
is unsafe. The standby can get ahead of the master while still
believing that the databases are in sync, due to the fact that after
an SR reconnect we rewind to the start of the current WAL segment.
This results in a silently corrupt standby database.

If it's unsafe to send written but unflushed WAL to the standby, then
for the same reasons we can't send unwritten WAL either. Therefore, I
believe that this entire patch in its current form is a nonstarter and
we should mark it Rejected in the CF app so that reviewers don't
unnecessarily spend time on it.

Having said that, I do think we urgently need some high-level design
discussion on how sync rep is actually going to handle this issue
(perhaps on a new thread). If we can't resolve this issue, sync rep
is going to be really slow; but there are no easy solutions to this
problem in sight, so if we want to have sync rep for 9.1 we'd better
agree on one of the difficult solutions soon so that work can begin.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2010-07-06 23:46:05 Re: Bug? Concurrent COMMENT ON and DROP object
Previous Message Robert Haas 2010-07-06 23:09:21 Re: Re: [COMMITTERS] pgsql: Fix log_temp_files docs and comments to say bytes not kilobytes.