Re: [streaming replication] 9.1.3 streaming replication bug ?

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: 乔志强 <qiaozhiqiang(at)leadcoretech(dot)com>
Cc: PostgreSQL pg-general List <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [streaming replication] 9.1.3 streaming replication bug ?
Date: 2012-04-11 16:35:55
Message-ID: CAHGQGwEMtc_ikq0ur5q26a4AuVO06Xgbwr+7pZcWwzu0L+8cFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Thu, Apr 12, 2012 at 12:56 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 11, 2012 at 3:31 PM, 乔志强 <qiaozhiqiang(at)leadcoretech(dot)com> wrote:
>> So in sync streaming replication, if master delete WAL before sent to the only standby, all transaction will fail forever,
>> "the master tries to avoid a PANIC error rather than termination of replication." but in sync replication, termination of replication is THE bigger PANIC error.
>
> I see your point. When there are backends waiting for replication, the WAL files
> which the standby might not have received yet must not be removed. If they are
> removed, replication keeps failing forever because required WAL files don't
> exist in the master, and then waiting backends will never be released unless
> replication mode is changed to async. This should be avoided.

On second thought, we can avoid the issue by just increasing
wal_keep_segments enough. Even if the issue happens and some backends
get stuck to wait for replication, we can release them by taking fresh backup
and restarting the standby from that backup. This is the basic procedure to
restart replication after replication is terminated because required WAL files
are removed from the master. So this issue might not be worth implementing
the patch for now (though I'm not against improving things in the future), but
it seems just a tuning-problem of wal_keep_segments.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sergey Konoplev 2012-04-11 16:50:10 Re: Multiple Slave Failover with PITR
Previous Message Ken Brush 2012-04-11 16:12:00 Re: Multiple Slave Failover with PITR

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2012-04-11 16:37:36 Re: Last gasp
Previous Message Tom Lane 2012-04-11 16:32:50 Re: [Patch] Fix little typo in a comment