Re: Assertion failure at standby promotion

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Assertion failure at standby promotion
Date: 2013-05-06 10:08:45
Message-ID: 5187812D.6090100@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03.05.2013 18:17, Fujii Masao wrote:
> Hi,
>
> I got the following assertion failure when I promoted the standby.
>
> 2013-05-04 00:12:31 JST sby1 LOG: received promote request
> 2013-05-04 00:12:31 JST sby1 FATAL: terminating walreceiver process
> due to administrator command
> 2013-05-04 00:12:31 JST sby1 LOG: redo done at 0/6FFE038
> 2013-05-04 00:12:31 JST sby1 LOG: last completed transaction was at
> log time 2013-05-04 00:12:25.878909+09
> 2013-05-04 00:12:31 JST sby1 LOG: selected new timeline ID: 2
> 2013-05-04 00:12:31 JST sby1 LOG: archive recovery complete
> 2013-05-04 00:12:31 JST sby1 LOG: checkpoint starting:
> TRAP: FailedAssertion("!(sentPtr<= sendTimeLineValidUpto)", File:
> "walsender.c", Line: 1465)
> 2013-05-04 00:12:31 JST sby1 LOG: autovacuum launcher started
>
> The way to reproduce this is:
>
> 1. Create one master A, one standby B, and one cascade standby C.
> 2. Run pgbench -i -s 10
> 3. Promote the standby B before pgbench -i finishes

I was able to reproduce this. The assertion checks that if the system is
promoted at WAL location X, we must not have already sent WAL at > X to
the client. As the code stands, that assumption is wrong; the walsender
will merrily stream WAL that hasn't been replayed yet, and the system
can be promoted before replaying all the WAL that has been streamed to a
cascading standby. The comment in GetStandbyFlushRecPtr(), which is the
function that determined how far the WAL may be streamed to a cascading
standby, says this:

> /*
> * We can safely send what's already been replayed. Also, if walreceiver
> * is streaming WAL from the same timeline, we can send anything that
> * it has streamed, but hasn't been replayed yet.
> */

There seems to be two bugs here:

1. This used to work in 9.2, because the startup process would always
replay all the WAL present in pg_xlog before promoting (the WAL present
in pg_xlog was streamed from master). But the refactorings in xlog.c in
9.3devel broke that, so that the startup process can promote earlier.

2. Even after fixing the logic in xlog.c, there is still a corner-case
where the startup process can promote before all the WAL that has been
received from walreceiver has been received. That happens if the WAL
streaming is terminated at a page boundary, rather than at a record
boundary. For example, we might have received WAL up to the page
boundary at 0/5BFA000, but the last *complete* record that we have
received ends at 0/5BF9BD8.

To fix the second issue, I think two things need to happen. First, we
need to suppress the check in walsender. Second, we need to teach the
WAL replay to back off when that happens. At the moment, the replay in
the cascading standby gets stuck, trying to fetch the next page
containing rest of the partial WAL record. Instead, it should throw away
the partial record it has, and resync at the end of the last replayed
record. I think 9.2 has the same bug, BTW. Without support for timeline
switches over streaming replication, it was just more difficult to hit.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-05-06 10:10:31 Re: Assertion failure at standby promotion
Previous Message soroosh sardari 2013-05-06 10:04:54 Re: Meaning of keyword category list in src/backend/parser/gram.y