Re: time-delayed standbys

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: time-delayed standbys
Date: 2011-06-16 18:29:43
Message-ID: BANLkTi=COgk+8cieWxhM7YN9guYXOrDb_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 15, 2011 at 1:58 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> When the replication connection is terminated, the standby tries to read
> WAL files from the archive. In this case, there is no walreceiver process,
> so how does the standby calculate the clock difference?

Good question. Also, just because we have streaming replication
available doesn't mean that we should force people to use it. It's
still perfectly legit to set up a standby that only use
archive_command and restore_command, and it would be nice if this
feature could still work in such an environment. I anticipate that
most people want to use streaming replication, but a time-delayed
standby is a good example of a case where you might decide you don't
need it. It could be useful to have all the WAL present (but not yet
applied) if you're thinking you might want to promote that standby -
but my guess is that in many cases, the time-delayed standby will be
*in addition* to one or more regular standbys that would be the
primary promotion candidates. So I can see someone deciding that
they'd rather not have the load of another walsender on the master,
and just let the time-delayed standby read from the archive.

Even if that were not an issue, I'm still more or less of the opinion
that trying to solve the time synchronization problem is a rathole
anyway. To really solve this problem well, you're going to need the
standby to send a message containing a timestamp, get a reply back
from the master that contains that timestamp and a master timestamp,
and then compute based on those two timestamps plus the reply
timestamp the maximum and minimum possible lag between the two
machines. Then you're going to need to guess, based on several cycles
of this activity, what the actual lag is, and adjust it over time (but
not too quckly, unless of course a large manual step has occurred) as
the clocks potentially drift apart from each other. This is basically
what ntpd does, except that it can be virtually guaranteed that our
implementation will suck by comparison. Time synchronization is
neither easy nor our core competency, and I think trying to include it
in this feature is going to result in a net loss of reliability.

>> errmsg("parameter \"%s\" requires a temporal value", "recovery_time_delay"),
>
> We should s/"a temporal"/"an Integer"?

It seems strange to ask for an integer when what we want is an amount
of time in seconds or minutes...

> After we run "pg_ctl promote", time-delayed replication should be disabled?
> Otherwise, failover might take very long time when we set recovery_time_delay
> to high value.

Yeah, I think so.

> http://forge.mysql.com/worklog/task.php?id=344
> According to the above page, one purpose of time-delayed replication is to
> protect against user mistakes on master. But, when an user notices his wrong
> operation on master, what should he do next? The WAL records of his wrong
> operation might have already arrived at the standby, so neither "promote" nor
> "restart" doesn't cancel that wrong operation. Instead, probably he should
> shutdown the standby, investigate the timestamp of XID of the operation
> he'd like to cancel, set recovery_target_time and restart the standby.
> Something like this procedures should be documented? Or, we should
> implement new "promote" mode which finishes a recovery as soon as
> "promote" is requested (i.e., not replay all the available WAL records)?

I like the idea of a new promote mode; and documenting the other
approach you mention doesn't sound bad either. Either one sounds like
a job for a separate patch, though.

The other option is to pause recovery and run pg_dump...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bernd Helmle 2011-06-16 18:30:48 Re: Re: starting to review the Extend NOT NULL representation to pg_constraint patch
Previous Message Heikki Linnakangas 2011-06-16 18:26:45 Re: WIP: Fast GiST index build