Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
Date: 2009-07-09 07:16:25
Message-ID: 3f0b79eb0907090016t38841368v45b916c9e57b1fe7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Jul 7, 2009 at 8:51 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
> http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com
>> I don't think we need or should
>> allow running regular queries before entering "replication mode". the
>> backend should become a walsender process directly after authentication.
>
> I changed the protocol according to your suggestion.
> Here is the current protocol:

Just to the record, I'd like to explain the correspondence relationship
between Heikki's protocol and mine.

> ReplicationStart (B)
>    Byte1('l'): Identifies the message as a replication-start indicator.
>    Int32(17): Length of message contents in bytes, including self.
>    Int32: The timeline ID
>    Int32: The start log file of replication
>    Int32: The start byte offset of replication

This corresponds to "StartReplication <begin>". But this is sent
from the primary to the standby, though "StartReplication" is sent
in theopposite direction. So, in the current design, the primary
determines the WAL streaming start position, which indicates the
head of the next XLOG file of the switched file by walsender.

> XLogData (B)
>    Byte1('w'): Identifies the message as XLOG records.
>    Int32: Length of message contents in bytes, including self.
>    Int8: Flag bits indicating how the records should be treated.
>    Int32: The log file number of the records.
>    Int32: The byte offset of the records.
>    Byte n: The XLOG records.

This corresponds to "WALRange <begin> <end> <data>". But
XLogData doesn't have <begin> in order to reduce the wire
traffic because it can be calculated from <end> and the length
of the records.

> XLogResponse (F)
>    Byte1('r'):  Identifies the message as ACK for XLOG records.
>    Int32: Length of message contents in bytes, including self.
>    Int8: Flag bits indicating how the records were treated.
>    Int32: The log file number of the records.
>    Int32: The byte offset of the records.

This corresponds to "ReplicatedUpTo <end>". They are almost
the same.

> If there is a missing XLOG file which is required for recovery, the
> startup process connects to the primary as a normal client, and
> receives the binary contents of the file by using the following SQL.
> This has nothing to do with the above protocol. So, the transfer of
> missing file and synchronous XLOG streaming are performed
> concurrently.
>
> COPY (SELECT pg_read_xlogfilie('filename', true)) TO STDOUT WITH BINARY

This corresponds to "RequestWAL <begin> <end>". Since the
XLOG file written to the standby has to be recoverable, I use the
filename instead of XLogRecPtr here, and make the primary send
the whole file. Also, this filename can indicate not only XLOG file
but also a history file.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2009-07-09 07:41:14 Re: *_collapse_limit, geqo_threshold
Previous Message Robert Haas 2009-07-09 06:43:25 Round Robin Reviewers