Re: Recovery inconsistencies, standby much larger than primary

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Recovery inconsistencies, standby much larger than primary
Date: 2014-01-31 23:34:02
Message-ID: 52EC32EA.8080103@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/31/2014 01:11 PM, Tom Lane wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
>> One thing I keep coming back to is a bad ran chip setting a bit in the
>> block number. But I just can't seem to get it to add up. The difference is
>> not a power of two, it had happened on two different machines, and we don't
>> see other weirdness on the machine. It seems like a strange coincidence it
>> would happen to the same variable twice and not to other variables.
>
> I also looked at the bit patterns for the two block numbers, and couldn't
> detect any relationship.
>
>> Unless there's some unrelated code writing through a wild pointer, possibly
>> to a stack allocated object that just happens to often be that variable?
>
> Yeah, I'd been wondering if the WAL record somehow got corrupted while
> in memory (presumably after being CRC-checked). It's a bit hard to see
> how though.
>
> Are all the bloated-on-the-slave relations indexes? I think the most
> fruitful thing to do at this point is to try to isolate the bloating
> events for the other affected rels as you've done for this one.
> Maybe we'll see a pattern.

FWIW, we've periodically seen reports from our clients of replica
databases being slightly larger than the master. Nothing reproducable
or as severe as Greg's issue, or we'd have reported it. But this could
be a more widespread issue, just that it affects most users in the +5%
ballpark, so they don't notice.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-01-31 23:38:29 Re: Redesigning checkpoint_segments
Previous Message Bruce Momjian 2014-01-31 23:32:03 Re: LDAP: bugfix and deprecated OpenLDAP API