Re: Synchronous Log Shipping Replication

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Markus Wanner <markus(at)bluegap(dot)ch>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous Log Shipping Replication
Date: 2008-09-09 11:38:01
Message-ID: 48C66019.40400@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii Masao wrote:
> What makes the sender process bottleneck?

The keyword here is "might". There's many possibilities, like:
- Slow network.
- Ridiculously fast disk. Like a RAM disk. If you have a synchronous
slave you can fail over to, putting WAL on a RAM disk isn't that crazy.
- slower WAL disk on the slave.
etc.

>> Backends then wait
>> * not at all for asynch commit
>> * just for Write for local synch commit
>> * for both Write and Send for remote synch commit
>> (various additional options for what happens to confirm Send)
>
> I'd like to introduce new parameter "synchronous_replication" which specifies
> whether backends waits for the response from WAL sender process. By
> combining synchronous_commit and synchronous_replication, users can
> choose various options.

There's one thing I haven't figured out in this discussion. Does the
write to the disk happen before or after the write to the slave? Can you
guarantee that if a transaction is committed in the master, it's also
committed in the slave, or vice versa?

>> Another thought occurs that we might measure the time a Send takes and
>> specify a limit on how long we are prepared to wait for confirmation.
>> Limit=0 => asynchronous. Limit > 0 implies synchronous-up-to-the-limit.
>> This would give better user behaviour across a highly variable network
>> connection.
>
> In the viewpoint of detection of a network failure, this feature is necessary.
> When the network goes down, WAL sender can be blocked until it detects
> the network failure, i.e. WAL sender keeps waiting for the response which
> never comes. A timeout notification is necessary in order to detect a
> network failure soon.

Agreed. But what happens if you hit that timeout? Should we enforce that
timeout within the server, or should we leave that to the external
heartbeat system?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Markus Wanner 2008-09-09 11:42:50 Re: Synchronous Log Shipping Replication
Previous Message Markus Wanner 2008-09-09 11:22:06 Re: Synchronous Log Shipping Replication