Re: BUG #14702: Streaming replication broken after server closed connection unexpectedly

From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14702: Streaming replication broken after server closed connection unexpectedly
Date: 2017-06-13 07:50:44
Message-ID: 3029E88A-83B1-49BC-B2BA-DB3709AA26F7@pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


> 13 juni 2017 kl. 03:57 skrev Michael Paquier <michael(dot)paquier(at)gmail(dot)com>:
>
> On Tue, Jun 13, 2017 at 6:52 AM, <girgen(at)pingpong(dot)net> wrote:
>> Setup is simple streaming replication: master -> slave. There is a
>> replication slot at the master, so xlogs should not be removed until the
>> client has received them properly.
>
> Hm. There has been the following discussion as well, which refers to a
> legit bug where WAL segments could be removed even if a slot is used:
> https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP+JoJAjoGx=GNuOAshEDWCext7BFvCQ@mail.gmail.com
> The circumstances to trigger the problem are quite particular though
> as it needs an incomplete WAL record at the end of a segment.
>
>> After this, the slave could not be started again, each time the same error
>> about "invalid memory alloc request size 1600487424".
>
> Hm. That smells of data corruption.. Be careful going forward.

I believe that corruption was in the broken WAL file though. I saw some notes pointing in that direction on the list, but sure, I could be mistaken.

>
>> Looking more closely, the last xlog file, let's call it 0000EB, is corrupt
>> on the slave, having a different checksum from the proper one at the master.
>
> To which checksum are you referring here? Did you do yourself a
> calculation using what is on-disk? Note that during streaming
> replication the content of an unfinished segment may be different than
> what is on the primary.

I calculated that myself using sha256 from the command line.

As you say, it was probably an unfinished segment. Problem is that the slave expects the *previous* wal file to still be saved on the master, but it was already removed. The slave *has* it though, so why would it required it to be transferred again? 0000EA was requested, although it was already completeley transferred to the slave. I had to copy that 0000EA back to the master so it could be transferred again.

>
>> Now, I don't know exactly what happened when the slave lost track, but the
>> bug, I think, is that the streamed WAL was corrupt, and still accepted by
>> the slave *and* hence removed from the master. It required too much
>> experience to fix that. The slave should not accept a not fully transported
>> WAL file. It seems it happened during some connection failure between the
>> slave and master, but still it should preferrably fail more gracefully. Are
>> the mechanisms implemented to support that, and they failed, or is it just
>> not implemented?
>
> There is a per-record CRC calculation to check the validity of each
> record, and this is processed when fetching each record at recovery as
> a sanity check. That's one way to prevent applying an incorrect
> record. In the event of such an error you would have seen "incorrect
> resource manager data checksum in record at" or similar. It seems to
> me that you should be careful with the primary as well.

OK. "Be careful" is somewhat vague, but I get it. Would a pg_dump + pg_restore, for example, reveal any data corruption. Or is it just not possibly to be totally sure unless checksums would have been activated (they're not, this is an old datbase).

> --
> Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Cocco Gianfranco 2017-06-13 09:10:50 R: Invalid WAL segment size. Allowed values are 1,2,4,8,16,32,64
Previous Message Pantelis Theodosiou 2017-06-13 06:19:49 Re: BUG #14704: How to create unique index with a case statement?