Re: [HACKERS] [streaming replication] 9.1.3 streaming replication bug ?

From: Michael Nolan <htfoot(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: 乔志强 <qiaozhiqiang(at)leadcoretech(dot)com>, PostgreSQL pg-general List <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [streaming replication] 9.1.3 streaming replication bug ?
Date: 2012-04-11 16:09:51
Message-ID: CAOzAquK6eE8y9KJv=TsxiwWbGbx4-jSQHe3waDgqfr-WidksNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 4/11/12, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Apr 11, 2012 at 3:31 PM, 乔志强 <qiaozhiqiang(at)leadcoretech(dot)com> wrote:
>> So in sync streaming replication, if master delete WAL before sent to the
>> only standby, all transaction will fail forever,
>> "the master tries to avoid a PANIC error rather than termination of
>> replication." but in sync replication, termination of replication is THE
>> bigger PANIC error.
>
> I see your point. When there are backends waiting for replication, the WAL
> files
> which the standby might not have received yet must not be removed. If they
> are
> removed, replication keeps failing forever because required WAL files don't
> exist in the master, and then waiting backends will never be released unless
> replication mode is changed to async. This should be avoided.
>
> To fix this issue, we should prevent the master from deleting the WAL files
> including the minimum waiting LSN or bigger ones. I'll think more and
> implement
> the patch.

With asynchonous replication, does the master even know if a slave
fails because of a WAL problem? And does/should it care?

Isn't there a separate issue with synchronous replication? If it
fails, what's the appropriate action to take on the master? PANICing
it seems to be a bad idea, but having transactions never complete
because they never hear back from the synchronous slave (for whatever
reason) seems bad too.
--
Mike Nolan

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ken Brush 2012-04-11 16:12:00 Re: Multiple Slave Failover with PITR
Previous Message Sergey Konoplev 2012-04-11 16:03:22 Re: Multiple Slave Failover with PITR

Browse pgsql-hackers by date

  From Date Subject
Next Message Ken Brush 2012-04-11 16:12:00 Re: Multiple Slave Failover with PITR
Previous Message Sergey Konoplev 2012-04-11 16:03:22 Re: Multiple Slave Failover with PITR