Re: Standby corruption after master is restarted

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, gurkan(dot)gur(at)innogames(dot)com, david(dot)pusch(at)innogames(dot)com, patrick(dot)schmidt(at)innogames(dot)com
Subject: Re: Standby corruption after master is restarted
Date: 2018-04-17 08:55:22
Message-ID: CAE2gYzwDDnYLMW999NGyTjB9Vr84eARJ5-uWdctKq5zPtGeruQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

> Can you check if the "incorrect" part of the WAL segment matches some
> previous segment? Verifying that shouldn't be very difficult (just cut a
> bunch of bytes using hexdump, compare to the incorrect data). Assuming
> you still have the WAL archive, of course. That would tell us that the
> corrupted part comes from an old recycled segment.

I had found and saved the recycled WAL file from the archive after the
incident. Here is the hexdump of it at the same position:

0bddfc0 3253 4830 616f 5034 5243 4d79 664f 6164
0bddfd0 3967 592d 7963 7967 5541 4a59 3066 4f50
0bddfe0 2d55 346e 4254 3559 6a4e 726b 4e30 6f52
0bddff0 3876 4751 4a38 5956 5f32 7234 4b55 7045
0bde000 d087 0005 0005 0000 e000 66bd 1dfb 0000
0bde010 1931 0000 0000 0000 5a43 7746 7166 6e34
0bde020 304e 764e 9c32 0158 5400 e709 0900 6f66
0bde030 0765 7375 6111 646e 6f72 6469 370d 312e

If you compare it with the other 2 I have posted, you would notice
that the corrupted file on standby is combination of the two. The
data on it starts with the data on the master, and continues with the
data of the recycled file. The switch is at the position 0bddff8
which is the position printed as "Minimum recovery ending location" by
pg_controldata.

> Hmmm, I see you're using SSL. I don't think that could break affect
> anything, but maybe I should try mimicking this aspect too.

This is the connection information. Although the master shows SSL
compression is disabled in despite of being explicitly asked for.

> primary_conninfo = 'host=MASTER_NODE port=5432 dbname=repmgr user=repmgr connect_timeout=10 sslcompression=1'

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2018-04-17 09:40:50 BUG #15160: planner overestimates number of rows in join when there are more than 200 rows coming from CTE
Previous Message Michael Paquier 2018-04-17 04:45:44 Re: BUG #15159: Duplicate records for same primary key

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-04-17 09:00:36 Re: ON CONFLICT DO UPDATE for partitioned tables
Previous Message Etsuro Fujita 2018-04-17 08:35:44 Re: Expression errors with "FOR UPDATE" and postgres_fdw with partition wise join enabled.