Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
Date: 2009-07-07 08:07:06
Message-ID: 4A53022A.7040603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii Masao wrote:
> On Tue, Jul 7, 2009 at 12:16 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>> In order for the primary server (ie. a normal backend) to read an archived file,
>>> restore_command needs to be specified in also postgresql.conf. In this case,
>>> how should we handle restore_command in recovery.conf?
>> I confess to not having paid much attention to this thread so far, but ...
>> what is the rationale for having such a capability at all?
>
> If the XLOG files which are required for recovery exist only in the
> primary server,
> the standby server has to read them in some way. For example, when the latest
> XLOG file of the primary server is 09 and the standby server has only 01, the
> missing files (02-08) has to be read for recovery by the standby server. In this
> case, the XLOG records in 09 or later are shipped to the standby server in real
> time by synchronous replication feature.
>
> The problem which I'd like to solve is how to make the standby server read the
> XLOG files (XLOG file, backup history file and timeline history) which
> exist only
> in the primary server. In the previous patch, we had to manually copy those
> missing files to the archive of the standby server or use the warm-standby
> mechanism. This would decrease the usability of synchronous replication. So,
> I proposed one of the solutions which makes the standby server read those
> missing files automatically: introducing new function pg_read_xlogfile() which
> reads the specified XLOG file.

pg_read_xlogfile() feels like a quite hacky way to implement that. Do we
require the master to always have read access to the PITR archive? And
indeed, to have a PITR archive configured to begin with. If you need to
set up archiving just because of the standby server, how do old files
that are no longer required by the standby get cleaned up?

I feel that the master needs to explicitly know what is the oldest WAL
file the standby might still need, and refrain from deleting files the
standby might still need. IOW, keep enough history in pg_xlog. Then we
have the risk of running out of disk space on pg_xlog if the connection
to the standby is lost for a long time, so we'll need some cap on that,
after which the master declares the standby as dead and deletes the old
WAL anyway. Nevertheless, I think that would be much simpler to
implement, and simpler for admins. And if the standby can read old WAL
segments from the PITR archive, in addition to requesting them from the
primary, it is just as safe.

I'd like to see a description of the proposed master/slave protocol for
replication. If I understood correctly, you're proposing that the
standby server connects to the master with libpq like any client,
authenticates as usual, and then sends a message indicating that it
wants to switch to "replication mode". In replication mode, normal FE/BE
messages are not accepted, but there's a different set of message types
for tranferring XLOG data.

I'd like to see a more formal description of that protocol and the new
message types. Some examples of how they would be in different
scenarios, like when standby server connects to the master for the first
time and needs to catch up.

Looking at the patch briefly, it seems to assume that there is only one
WAL sender active at any time. What happens when a new WAL sender
connects and one is active already? While supporting multiple slaves
isn't a priority, I think we should support multiple WAL senders right
from the start. It shouldn't be much harder, and otherwise we need to
ensure that the switch from old WAL sender to a new one is clean, which
seems non-trivial. Or not accept a new WAL sender while old one is still
active, but then a dead WAL sender process (because the standby suddenly
crashed, for example) would inhibit a new standby from connecting,
possibly for several minutes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Brendan Jurd 2009-07-07 08:29:07 Re: commitfest.postgresql.org
Previous Message Michael Meskes 2009-07-07 08:03:19 Re: ECPG support for string pseudo-type