Re: Recovery inconsistencies, standby much larger than primary

From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Recovery inconsistencies, standby much larger than primary
Date: 2014-02-12 19:27:03
Message-ID: CAM-w4HMJ8qHKfjEBGw6SUjgLV-+zh8ege0cA1PJ5wG=Ys=nAyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 12, 2014 at 6:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <stark(at)mit(dot)edu> writes:
>> On Wed, Feb 12, 2014 at 5:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> How about the attached instead?
>
>> This does possibly allocate an extra block past the target block. I'm
>> not sure how surprising that would be for the rest of the code.
>
> Should be fine; we could end up with an extra block after a failed
> extension operation in any case.

I know it's fine on the active database, I'm not so clear whether it's
compatible with the xlog records from the primary. I suppose it'll
just see an Initialize Page record and happily see the nul block and
initialize it. It's still a bit scary.

I of course think my code is vastly clearer than the existing code but
yes, I see the "unnecessary churn" argument. I think that's the
fundamental problem with lacking tests, it makes the code get more and
more complex as we're reluctant to simplify it out of fear.

>> For what it's worth I've confirmed the bug in wal-e caused the initial
>> problem.
>
> Huh? Bug in wal-e? What bug?

WAL-E actually didn't restore a whole 1GB file due to a transient S3
problem, in fact a bunch of them. It's remarkable that Postgres kept
going with that much data missing. But the arithmetic worked out on
the case I checked it on, which was the last one that I just sent the
xlog record for last night. In that case there was precisely one
segment missing and the relation was extended by the number of
segments you would expect if it filled in that missing segment and
then jumped to the end of the relation.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-02-12 19:39:37 issue with gininsert under very high load
Previous Message Robert Haas 2014-02-12 19:17:01 Re: dynamic shared memory and locks