Streaming Replication patch for CommitFest 2009-09

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Streaming Replication patch for CommitFest 2009-09
Date: 2009-09-14 11:24:45
Message-ID: 3f0b79eb0909140424q6bb8e6a3ka63b5816bb1d3c45@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here is the latest version of Streaming Replication (SR) patch.

There were four major problems in the SR patch which was submitted for
the last CommitFest. The latest patch has overcome those problems:

> 1. Change the way synchronization is done when standby connects to
> primary. After authentication, standby should send a message to primary,
> stating the <begin> point (where <begin> is an XLogRecPtr, not a WAL
> segment name). Primary starts streaming WAL starting from that point,
> and keeps streaming forever. pg_read_xlogfile() needs to be removed.

In the latest version, at first, the standby attempts to do an archive recovery
as long as there is WAL record available in pg_xlog or archival area (only
possible if restore_command is supplied). When it finds the recovery error
(e.g., there is no WAL file available), it starts walreceiver process, and
requests the primary server to ship the WAL records following the last applied
record. Then the primary continuously sends the WAL records. OTOH, the
standby continuously receives, writes and replays them.

> 2. The primary should have no business reading back from the archive.
> The standby can read from the archive, as it can today.

I got rid of the capability to restore the archived file, from the
primary. Also in
order not to lose the WAL file (required for the standby) from pg_xlog before
sending it, I tweaked the recycling policy of checkpoint.

> 3. Need to support multiple WALSenders. While multiple slave support
> isn't 1st priority right now, it's not acceptable that a new WALSender
> can't connect while one is active already. That can cause trouble in
> case of network problems etc.

In the latest version, more than one standbys can establish a connection to
the primary. The WAL is concurrently shipped to those standbys, respectively.
The maximum number of standbys can be specified as a GUC variable
(max_wal_senders: better name?).

> 4. It is not acceptable that normal backends have to wait for walsender
> to send data. That means that connecting a standby behind a slow
> connection to the primary can grind the primary to a halt. walsender
> needs to be able to read data from disk, not just from shared memory. (I
> raised this back in December
> http://archives.postgresql.org/message-id/495106FA.1050605@enterprisedb.com)

In the latest version, the walsender reads the WAL records from disk
instead of wal_buffers. So when the backend attempts to delete old data
from wal_buffer to insert new one, it doesn't need to wait until walsender
has read that data from wal_buffers.

> As a hint, I think you'll find it a lot easier if you implement only
> asynchronous replication at first. That reduces the amount of
> inter-process communication a lot. You can then add synchronous
> capability in a later commitfest. I would also suggest that for point 4,
> you implement WAL sender so that it *only* reads from disk at first, and
> only add the capability send from wal_buffers later on, and only if
> performance testing shows that it's needed.

I advance development of SR in stages as Heikki suggested.
So note that the current patch provides only core part of *asynchronous*
log-shipping. There are many TODO items for later CommitFests:
synchronous capability, more useful statistics for SR, some feature for
admin, and so on.

The attached tarball contains some files. Description of each files,
a brief procedure to set up SR and the functional overview of it are in wiki.
And, I'm going to add the description of design of SR into wiki as much
as possible.
http://wiki.postgresql.org/wiki/Streaming_Replication

If you notice anything, please feel free to comment!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
SR_0914.tgz application/x-gzip 137.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-09-14 13:28:21 Re: BUG #5053: domain constraints still leak
Previous Message Heikki Linnakangas 2009-09-14 11:08:14 Re: Encoding issues in console and eventlog on win32