Re: Replication: slave is in permanent startup 'recovery'

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Replication: slave is in permanent startup 'recovery'
Date: 2011-04-14 08:01:40
Message-ID: 4DA6A9E4.2020305@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 14/04/2011 2:15 AM, Henry C. wrote:
> Greets,
>
> Pg 9.0.3
>
> This must be due to my own misconfiguration, so apologies if I'm not seeing
> the obvious - I've noticed that my slave seems to be stuck in a permanent
> startup/recovery state.

That's what warm- and hot-standby slaves are. They're continuously
replaying WAL files from the master, essentially the same thing as
during recovery from a bad shutdown. The advantage is that it's
*extremely* well tested code.

> If I try and execute a long-lived SQL query on the slave, it eventually fails
> with "canceling statement due to conflict with recovery".

That's a limitation of streaming replication. It's a lot like the issue
Oracle has with running out of undo or redo log space. Essentially, my
understanding is that the hot standby server cannot replay WAL archives
to keep up with the master's changes at the same time as running
queries. To avoid getting too far behind the master because of a huge or
stuck query, it'll cancel very long-running queries.

Again from my limited understanding, the reason it can't replay WAL is
because the WAL records include overwrites of pages VACUUMed and re-used
on the master. HS is block-level replication; it cannot keep a page
in-place on the slave when the master has erased or overwritten it.

It's theoretically possible for the slave to copy blocks that're about
to be written out-of-line into a slave-side-only store of blocks that've
been erased on the master but are still needed by transactions on the
slave. The discussion I've read suggests that that'd be ... complicated
... to make work well especially with log replay happening concurrently.

> Replication is
> definitely working (DML actions are propagated to the slave), but something is
> amiss.

Nope, it's working as designed I'm afraid.

There are params you can tune to control how far slaves are allowed to
get behind the master before cancelling queries. I don't remember what
they are, but the manual will cover them. Do consider though that the
more behind the slave is, the more log files the master has to have
space to keep... and if the master runs out of space, things get ugly.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daron Ryan 2011-04-14 08:01:59 Cannot start Postgres : invalid data in PID file
Previous Message Henry C. 2011-04-14 07:45:55 Re: Replication: slave is in permanent startup 'recovery'