Re: Nasty btree deletion bug

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Nasty btree deletion bug
Date: 2006-10-26 15:00:15
Message-ID: 4540CD7F.30909@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> I wrote:
>> I've been analyzing Ed L's recent report of index corruption:
>> http://archives.postgresql.org/pgsql-general/2006-10/msg01183.php

Auch. That's nasty indeed.

> So I think the rule needs to be "don't delete the rightmost child unless
> it's the only child, in which case you can delete the parent too --- but
> the same restriction must be observed at the next level up".
> ....
> The concept of a half-dead page would remain, but it'd be a transient
> state that would normally only persist for a moment between atomic
> page-delete actions. If we crash between two such actions, the
> half-dead page would remain present, but would be found and cleaned up
> by the next VACUUM. In the meantime it wouldn't cause any problem
> because the keyspace it gives up will belong to a sibling of the same
> parent at whatever level the delete is ultimately supposed to stop at,
> and so inserts and even splits in that keyspace won't create an
> inconsistency.

I don't understand how this "in the meantime" thing works. I tried to
work out a step-by-step example, could you take a look at it? See
http://users.tkk.fi/~hlinnaka/pgsql/btree-deletion-bug/

> ...
>
> Comments? Have I missed anything?

It took me a lot of time with pen and paper to understand the issue. And
I'm not sure I still understood it fully. The logic is very complex,
which is bad for maintainability in itself :(.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-10-26 15:47:29 Re: Nasty btree deletion bug
Previous Message Volkan YAZICI 2006-10-26 14:52:27 Re: pg_get_domaindef()