Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly
Date: 2016-11-17 06:10:20
Message-ID: 0A3221C70F24FB45833433255569204D1F641604@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Amit Kapila
> I think it beginning of segment (aka the first page of the segment), even
> the comment indicates the same.
>
> /*
> * Whenever switching to a new WAL segment, we read the first page of
> * the file and validate its header, even if that's not where the
> * target record is. ...
> ..
> */
>
> However, on again looking at the code, it seems that part of code behaves
> similarly for both 9.2 and 9.3.

Yes, the code behaves similarly in 9.2 and later. FYI, ValidXLogHeader() is called at two sites. The earlier one checks the first page of a segment when the real target page is different, and the latter one checks any page including the first page of a segment.

> ..Because node3 found a WAL
> ! * record fragment at the end of segment 10, it expects to find the !
> * remaining fragment at the beginning of WAL segment 11 streamed from !
> * node2. But there was a fragment of a different WAL record, because ! *
> node2 overwrote a different WAL record at the end of segment 10 across !
> * to 11.
>
> How does node3 ensure that the fragment of WAL in segment 11 is different?
> Isn't it possible that when node2 overwrites the last record in WAL segment
> 10, it writes a record of slightly different contents but which is of the
> same size as an original record in WAL segment 10?

That case is detected by checking the CRC value in the XLOG record header.

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2016-11-17 06:15:41 Re: Exclude pg_largeobject form pg_dump
Previous Message Etsuro Fujita 2016-11-17 05:55:21 Re: Push down more full joins in postgres_fdw