Re: B-tree parent pointer and checkpoints

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: B-tree parent pointer and checkpoints
Date: 2011-09-06 13:45:35
Message-ID: 4E6623FF.1070500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06.09.2011 16:40, Robert Haas wrote:
> On Tue, Sep 6, 2011 at 6:21 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> The way it would work is that on page split the right page is flagged with
>> MISSING_DOWNLINK flag. When the downlink is inserted into the parent, the
>> flag is cleared in the same critical section as the WAL record for the
>> insertion of the parent is written. Normally, a backend would never see the
>> flag set, because the locks on the split pages are not released until the
>> parent record is written and the flag cleared again. But if inserting the
>> downlink fails for any reason, the next inserter or vacuum that steps on the
>> page can finish the split by inserting the downlink.
>>
>> Unfortunately that means holding the locks on the split pages longer than we
>> do at the moment. Currently they are released as soon as the parent page is
>> locked; with this change they would need to be held until the WAL record of
>> the downlink insertion is done. B-tree is so heavily used that I'm a bit
>> hesitant to sacrifice any concurrency there, but I don't think it would be
>> noticeable in practice.
>
> Do you really need to hold the page locks for all that time, or could
> you cheat? Like... release the locks on the split pages but then go
> back and reacquire them to clear the flag...

Hmm, there's two issues with that:

1. While you're not holding the locks on the child pages, someone can
step onto the page and see that the MISSING_DOWNLINK flag is set, and
try to finish the split for you.

2. If you don't hold the page locked while you clear the flag, someone
can start and finish a checkpoint after you've inserted the downlink,
and before you've cleared the flag. You end up in a scenario where the
flag is set, but the page in fact *does* have a downlink in the parent.

So, nope, we can't cheat.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-09-06 13:47:25 Re: [v9.1] sepgsql - userspace access vector cache
Previous Message Stefan Keller 2011-09-06 13:40:44 Re: WIP: Fast GiST index build