Re: Recovery inconsistencies, standby much larger than primary

From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Recovery inconsistencies, standby much larger than primary
Date: 2014-02-01 02:32:21
Message-ID: CAM-w4HOnmqPXXRiA+-ojbVmt0-rtAEo0zM47Kea7dV9a0T-W+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 31, 2014 at 10:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, I'd been wondering if the WAL record somehow got corrupted while
> in memory (presumably after being CRC-checked). It's a bit hard to see
> how though.

One thing I mentioned early on but bears repeating is that this
instance is 9.1.11.

Also something that occurred to me at 3am -- the "reference to invalid
pages" recovery errors that replayed correctly after the panic might
also explain why the slave seems to operate correctly. It's possible
after the panic it replayed those same records correctly.

> Are all the bloated-on-the-slave relations indexes? I think the most
> fruitful thing to do at this point is to try to isolate the bloating
> events for the other affected rels as you've done for this one.
> Maybe we'll see a pattern.

I'll poke at those others tomorrow/today. I can also try to bring up a
new standby from the same base backup but it'll take time. It's a
large database. Also the fear I have above is that if I set a recovery
target I might make it miss the bug.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-02-01 03:07:30 Re: Compression of full-page-writes
Previous Message Bruce Momjian 2014-02-01 02:30:07 Re: Small catcache optimization