Re: Inadequate thought about buffer locking during hot standby replay

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, daniel(at)heroku(dot)com
Subject: Re: Inadequate thought about buffer locking during hot standby replay
Date: 2012-11-10 17:05:31
Message-ID: CA+U5nM+p6ze4PMd6YvcMDrVru01_OcYZ=43T7EzsoqQgEpt8eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9 November 2012 23:24, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> During normal running, operations such as btree page splits are
> extremely careful about the order in which they acquire and release
> buffer locks, if they're doing something that concurrently modifies
> multiple pages.
>
> During WAL replay, that all goes out the window. Even if an individual
> WAL-record replay function does things in the right order for "standard"
> cases, RestoreBkpBlocks has no idea what it's doing. So if one or more
> of the referenced pages gets treated as a full-page image, we are left
> with no guarantee whatsoever about what order the pages are restored in.
> That never mattered when the code was originally designed, but it sure
> matters during Hot Standby when other queries might be able to see the
> intermediate states.
>
> I can't prove that this is the cause of bug #7648, but it's fairly easy
> to see that it could explain the symptom. You only need to assume that
> the page-being-split had been handled as a full-page image, and that the
> new right-hand page had gotten allocated by extending the relation.
> Then there will be an interval just after RestoreBkpBlocks does its
> thing where the updated left-hand sibling is in the index and is not
> locked in any way, but its right-link points off the end of the index.
> If a few indexscans come along before the replay process gets to
> continue, you'd get exactly the reported errors.
>
> I'm inclined to think that we need to fix this by getting rid of
> RestoreBkpBlocks per se, and instead having the per-WAL-record restore
> routines dictate when each full-page image is restored (and whether or
> not to release the buffer lock immediately). That's not going to be a
> small change unfortunately :-(

No, but it looks like a clear bug scenario and a clear resolution also.

I'll start looking at it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-11-10 17:10:36 Re: Further pg_upgrade analysis for many tables
Previous Message Noah Misch 2012-11-10 16:49:39 Re: [PATCH] Patch to compute Max LSN of Data Pages