Re: [BUG] False indication in pg_stat_replication.sync_state

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUG] False indication in pg_stat_replication.sync_state
Date: 2012-10-18 14:41:35
Message-ID: CAHGQGwGGjYH2yPFwftZFi7hHTsqjVytbPjGmB530+xudz4VonQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 18, 2012 at 5:42 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello. My colleague found that pg_stat_replication.sync_state
> shows false state for some condition.
>
> This prevents Pacemaker from completing fail-over that could
> safely be done.
>
> The point is in walsender.c, pg_stat_get_wal_senders() below, (as
> of REL9_2_1)
>
> 1555: if (walsnd->pid != 0)
> 1556: {
> 1557: /*
> 1558: * Treat a standby such as a pg_basebackup background process
> 1559: * which always returns an invalid flush location, as an
> 1560: * asynchronous standby.
> 1561: */
> ! 1562: sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ?
> 1563: 0 : walsnd->sync_standby_priority;
>
> Here, XLogRecPtrIsInvalid(walsnd->flush) is defined as
> (walsnd->flush.xrecoff == 0) which becomes true as usual at every
> WAL 'file's (not segments) boundary. xrecoff == 0 is certainly
> invalid for the start point of WAL *RECORD*, but should be
> considered valid in replication stream. This check was introduced
> at 9.2.0 and the version up between 9.1.4 and 9.1.5.
>
> | DEBUG: write 4/0 flush 3/FEFFEC68 apply 3/FEFFEC68
> | DEBUG: write 4/0 flush 4/0 apply 3/FEFFEC68
> | DEBUG: HOGE: flush = 3/FEFFEC68 sync_priority[0] = 1
> | DEBUG: write 4/111C0 flush 4/0 apply 3/FEFFECC0
> !| DEBUG: HOGE: flush = 4/0 sync_priority[0] = 0
>
> This value zero of sync_priority[0] makes sync_status 'async'
> errorneously and confuses Pacemaker.
>
> # The log line marked with 'HOGE' above printed by applying the
> # patch at the bottom of this message and invoking 'select
> # sync_state from pg_stat_replication' periodically. To increase
> # the chance to see the symptom, sleep 1 second for 'file'
> # boundaries :-)
>
> The Heikki's recent(?) commit
> 0ab9d1c4b31622e9176472b4276f3e9831e3d6ba which changes the format
> of XLogRecPtr from logid:xrecoff struct to 64 bit linear address
> would fix the false indication. But I suppose this patch won't be
> applied to existing 9.1.x and 9.2.x because of the modification
> onto streaming protocol.
>
> As far as I see the patch, it would'nt change the meaning of
> XLogRecPtr to change XLogRecPtrIsInvalid from (xrecoff == 0) to
> (xrecoff == 0 && xlogid == 0). But this change affects rather
> wide portion where handling WAL nevertheless what is needed here
> is only to stop the false indication.
>
> On the other hand, pg_basebackup seems return 0/0 as flush and
> apply positions so it seems enough only to add xlogid == 0 into
> the condition. The patch attached for REL9_2_1 does this and
> yields the result following.
>
> | DEBUG: write 2/FEFFFD48 flush 2/FEFFFD48 apply 2/FEFF7AB0
> | DEBUG: write 3/0 flush 2/FEFFFD48 apply 2/FEFF7E88
> | DEBUG: write 3/0 flush 3/0 apply 2/FEFFFD48
> | DEBUG: HOGE: flush = 2/FEFFFD48 sync_priority[0] = 1
> | DEBUG: write 3/E338 flush 3/0 apply 2/FEFFFF80
> !| DEBUG: HOGE: flush = 3/0 sync_priority[0] = 1
>
> I think this patch should be applied for 9.2.2 and 9.1.7.

Looks good to me, though I don't think the source code comment needs
to be updated in the way the patch does.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-10-18 14:47:12 Re: First draft of snapshot snapshot building design document
Previous Message Alvaro Herrera 2012-10-18 14:24:59 Re: September 2012 commitfest