Re: B-tree parent pointer and checkpoints

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: B-tree parent pointer and checkpoints
Date: 2011-09-06 13:40:08
Message-ID: CA+TgmoaH0Qy10CdRMDQH7-ELkD9FawnkYOOzE06oMYB4n3Ac9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 6, 2011 at 6:21 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Nope.
>
> On a closer look, this isn't only a problem for page deletion. Page
> splitting also barfs if it can't find the parent of a page. As the code
> stands, a missing downlink is not harmless, but causes all sorts of trouble.
>
> The window for this to happen with a checkpoint is extremely tight, but
> there's another situation where you can end up with a missing downlink: if
> you run out of disk space while splitting a parent page, to insert a
> downlink to it.
>
> I think we should do a similar fix to b-tree that I did to GiST, and put a
> flag on pages with missing downlinks. Then we can fix the missing downlinks
> in vacuum and insertion, and get rid of the code to fix incomplete splits
> after WAL replay.
>
> The way it would work is that on page split the right page is flagged with
> MISSING_DOWNLINK flag. When the downlink is inserted into the parent, the
> flag is cleared in the same critical section as the WAL record for the
> insertion of the parent is written. Normally, a backend would never see the
> flag set, because the locks on the split pages are not released until the
> parent record is written and the flag cleared again. But if inserting the
> downlink fails for any reason, the next inserter or vacuum that steps on the
> page can finish the split by inserting the downlink.
>
> Unfortunately that means holding the locks on the split pages longer than we
> do at the moment. Currently they are released as soon as the parent page is
> locked; with this change they would need to be held until the WAL record of
> the downlink insertion is done. B-tree is so heavily used that I'm a bit
> hesitant to sacrifice any concurrency there, but I don't think it would be
> noticeable in practice.

Do you really need to hold the page locks for all that time, or could
you cheat? Like... release the locks on the split pages but then go
back and reacquire them to clear the flag...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stefan Keller 2011-09-06 13:40:44 Re: WIP: Fast GiST index build
Previous Message Robert Haas 2011-09-06 13:35:54 Re: [v9.1] sepgsql - userspace access vector cache