Re: New replication mode: write

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New replication mode: write
Date: 2012-01-13 12:27:13
Message-ID: CAHGQGwE+Zxk_yNw0rw8bo__+YzFOvEw3HtCb+8FQL=fzTaPxJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>>> Thought? Comments?
>>
>> This is almost exactly the same as my patch series
>> "syncrep_queues.v[1,2].patch" earlier this year. Which I know because
>> I was updating that patch myself last night for 9.2. I'm about half
>> way through doing that, since you and I agreed in Ottawa I would do
>> this. Perhaps it is better if we work together?
>
> I think this comment is mostly pointless. We don't have time to work
> together and there's no real reason to. You know what you're doing, so
> I'll leave you to do it.
>
> Please add the Apply mode.

OK, will do.

> In my patch, the reason I avoided doing WRITE mode (which we had
> previously referred to as RECV) was that no fsync of the WAL contents
> takes place. In that case we are applying changes using un-fsynced WAL
> data and in case of crash this would cause a problem.

My patch has not changed the execution order of WAL flush and replay.
WAL records are always replayed after they are flushed by walreceiver.
So, such a problem doesn't happen.

But which means that transaction might need to wait for WAL flush caused
by previous transaction even if WRITE mode is chosen. Which limits the
performance gain by WRITE mode, and should be improved later, I think.

> I was going to
> make the WalWriter available during recovery to cater for that. Do you
> not think that is no longer necessary?

That's still necessary to improve the performance in sync rep further, I think.
What I'd like to do (maybe in 9.3dev) after supporting WRITE mode is:

* Allow WAL records to be replayed before they are flushed to the disk.
* Add new GUC parameter specifying whether to allow the standby to defer
WAL flush. If the parameter is false, walreceiver flushes WAL whenever it
receives WAL (i.e., it's same as the current behavior). If true, walreceiver
doesn't flush WAL at all. Instead, walwriter, backend or startup process
does that. Walwriter periodically checks whether there is un-flushed WAL
file, and flushes it if exists. When the buffer page is written out, backend
or startup process forces WAL flush up to buffer's LSN.

If the above GUC parameter is set to true (i.e., walreceiver doesn't flush
WAL at all) and WRITE mode is chosen, transaction doesn't need to wait
for WAL flush on the standby at all. Also the frequency of WAL flush on
the standby would become lower, which significantly reduces I/O load.
After all, the performance in sync rep would improve very much.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-01-13 12:52:44 Re: New replication mode: write
Previous Message Simon Riggs 2012-01-13 11:44:35 Re: read transaction and sync rep