Re: Synchronous Log Shipping Replication

From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Markus Wanner" <markus(at)bluegap(dot)ch>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous Log Shipping Replication
Date: 2008-09-09 11:12:32
Message-ID: 3f0b79eb0809090412r1fa85a81i331c3db7ef833437@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 9, 2008 at 5:11 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> Yes. We should have a LogwrtRqst pointer and LogwrtResult pointer for
> the send operation. The Write and Send operations can then continue
> independently of one another. XLogInsert() cannot advance to a new page
> while we are waiting to send or write.

Agreed.

For realizing various synchronous options, the Write and Send operations should
be treated separately. So, I'll introduce XLogCtlSend structure which is shared
state data for WAL sending. XLogCtlInsert might need new field LogsndResult
which indicates a byte position that we have already sended. As you say,
AdvanceXLInsertBuffer() must check both position that we have already written
(fsynced) and sended. I'm doing the detail design of this now :)

> Notice that the Send process
> might be the bottleneck - that is the price of synchronous replication.

Really? In the benchmark result of my prototype, the bottleneck is
still disk I/O.
The communication (between the master and the slave) latency is smaller than
WAL writing (fsyncing) one. Of course, I assume that we use not-poor network
like 1000BASE-T.

What makes the sender process bottleneck?

> Backends then wait
> * not at all for asynch commit
> * just for Write for local synch commit
> * for both Write and Send for remote synch commit
> (various additional options for what happens to confirm Send)

I'd like to introduce new parameter "synchronous_replication" which specifies
whether backends waits for the response from WAL sender process. By
combining synchronous_commit and synchronous_replication, users can
choose various options.

> After (or during) XLogInsert backends will sleep in a proc queue,
> similar to LWlocks and protected by a spinlock. When preparing to
> write/send the WAL process should read the proc at the *tail* of the
> queue to see what the next LogwrtRqst should be. Then it performs its
> action and wakes procs up starting with the head of the queue. We would
> add LSN into PGPROC, so WAL processes can check whether the backend
> should be woken. The LSN field can be accessed without spinlocks since
> it is only ever set by the backend itself and only read while a backend
> is sleeping. So we access spinlock, find tail, drop spinlock then read
> LSN of the backend that (was) the tail.

You mean only XLogInsert treating "commit record" or every XLogInsert?
Anyway, ISTM that the response time get worse :(

> Another thought occurs that we might measure the time a Send takes and
> specify a limit on how long we are prepared to wait for confirmation.
> Limit=0 => asynchronous. Limit > 0 implies synchronous-up-to-the-limit.
> This would give better user behaviour across a highly variable network
> connection.

In the viewpoint of detection of a network failure, this feature is necessary.
When the network goes down, WAL sender can be blocked until it detects
the network failure, i.e. WAL sender keeps waiting for the response which
never comes. A timeout notification is necessary in order to detect a
network failure soon.

regards

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2008-09-09 11:22:06 Re: Synchronous Log Shipping Replication
Previous Message Brendan Jurd 2008-09-09 11:04:39 Re: [PATCHES] to_date() validation