Re: Index use during Hot Standby

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index use during Hot Standby
Date: 2008-10-20 20:09:09
Message-ID: 1224533349.3808.850.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mon, 2008-10-20 at 21:11 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > OK, I think I've found a problem.
> >
> > In _bt_insertonpg(), if we split we do _bt_split() then do
> > _bt_insert_parent(), which then does _bt_insertonpg() recursively.
> >
> > _bt_split() writes a WAL record but continues holding write locks.
> > btree_xlog_split() reads WAL record and does *not* continue to hold
> > write locks. So recovery locking differs from Lehman & Yao requirements
> > at that point.
>
> Hmm. I don't have Lehman & Yao's paper at hand, but I fail to see what
> would go wrong.
>
> Recovery of a split works like this:
>
> 1. Reconstruct new right sibling from scratch. Keep locked
> 2. Update old page (= new left sibling). Keep locked
> 3. Release locks on both pages.
> 4. Update the left-link of the page to the right of the new right sibling.
>
> Searches descending work just fine without the pointer in the parent
> page to the new right sibling, just slower because they will always land
> on the left sibling, and might have move right from there. Searchers
> moving from left to right work fine; they will see either the old page,
> or both the new left and right sibling. Searchers moving right to left
> will likewise work; they will see either the old page, or the new right,
> then left page, or between steps 3 and 4, they will move to the left
> page, see that the right-link doesn't point to the page it came from,
> and move right to the new right sibling.
>
> All that works just like during normal operation, so I don't actually
> understand why L&Y requires that you keep the split pages locked until
> you've locked the parent. Maybe it's needed to handle concurrent inserts
> or splits, but there can't be any during WAL replay.

I think you're right to question that. I was happy to say "locking must
be identical", which is correct, but the assumptions are different in
recovery, as you point out. The details you cite are not as important as
the realisation that we need to get concurrency correct from the
perspective of only a single inserter and many readers. I'd overlooked
that basic assumption, but its important we clearly state that:
"Recovery operations are serialized and therefore recovery operations
need not fully emulate the locking required for multiple concurrent
writers."

Grokking the paper suggests to me you are correct and that the double
locking can only be required for correct ordering of page split
operations. It clearly isn't needed at all for the correctness proof
with a single inserter on p.350.

So updating blocks for a page split on any one level is performed by a
single WAL record and therefore atomic. None of the things I worried
about earlier need addressing, so we're back to where I was this
morning: no changes required for correct concurrency behaviour.

Thanks everybody for a valuable discussion.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-10-20 20:18:59 Re: Subtransaction commits and Hot Standby
Previous Message Heikki Linnakangas 2008-10-20 19:24:52 Re: Window Functions: buffering strategy