Re: time-delayed standbys

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: time-delayed standbys
Date: 2011-06-29 18:11:34
Message-ID: BANLkTi=33oiYV36cM_nOW7_Pi3t4E=P32g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 29, 2011 at 1:50 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> As implemented, the feature will work with either streaming
>> replication or with file-based replication.
>
> That sounds like the exact opposite of yours and Fujii's comments
> above. Please explain.

I think our comments above were addressing the issue of whether it's
feasible to correct for time skew between the master and the slave.
Tom was arguing that we should try, but I was arguing that any system
we put together is likely to be pretty unreliable (since good time
synchronization algorithms are quite complex, and to my knowledge no
one here is an expert on implementing them, nor do I think we want
that much complexity in the backend) and Fujii was pointing out that
it won't work at all if the WAL files are going through the archive
rather than through streaming replication, which (if I understand you
correctly) will be a more common case than I had assumed.

>> I don't see any value in
>> restricting to work ONLY with file-based replication.
>
> As explained above, it won't work in practice because of the amount of
> file space required.

I guess it depends on how busy your system is and how much disk space
you have. If using streaming replication causes pg_xlog to fill up on
your standby, then you can either (1) put pg_xlog on a larger file
system or (2) configure only restore_command and not primary_conninfo,
so that only the archive is used.

> Or, an alternative question: what will you do when it waits so long
> that the standby runs out of disk space?

I don't really see how that's any different from what happens now. If
(for whatever reason) the master is generating WAL faster than a
streaming standby can replay it, then the excess WAL is going to pile
up someplace, and you might run out of disk space. Time-delaying the
standby creates an additional way for that to happen, but I don't
think it's an entirely new problem.

I am not sure exactly how walreceiver handles it if the disk is full.
I assume it craps out and eventually retries, so probably what will
happen is that, after the standby's pg_xlog directory fills up,
walreceiver will sit there and error out until replay advances enough
to remove a WAL file and thus permit some more data to be streamed.
If the standby gets far enough behind the master that the required
files are no longer there, then it will switch to the archive, if
available. It might be nice to have a mode that only allows streaming
replication when the amount of disk space on the standby is greater
than or equal to some threshold, but that seems like a topic for
another patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Casey Havenor 2011-06-29 18:22:11 Patch file questions?
Previous Message Radosław Smogura 2011-06-29 17:57:03 Review of patch Bugfix for XPATH() if expression returns a scalar value