Re: GiST insert algorithm rewrite

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GiST insert algorithm rewrite
Date: 2010-11-16 18:01:04
Message-ID: 2029.1289930464@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> There are two key changes to the insert algorithm:

> 1. When we walk down the tree searching for a suitable leaf page to
> insert the new tuple to, we update the nodes on the way down so that
> they are consistent with the new key we're inserting. The old GiST
> algorithm adjusted all the parent pages only after inserting the tuple
> on the leaf. Updating them on the way down ensures that the tree is
> self-consistent at all times, even if we crash just after inserting the
> new tuple on the leaf page.

> 2. When a page is split, we mark the new left page with a flag to
> indicate that the downlink for the page to the right hasn't been
> inserted yet. When the downlink is inserted, the flag is cleared. Again
> the purpose is to ensure that the tree is self-consistent at all times.
> If we crash just after a page split, before the downlink is inserted,
> scans will find the tuples on the right page by following the rightlink.
> It's slightly less performant, but correct.

The one thought that comes to mind is how does the flag business work
after multiple splittings? That is, assume we have a page that has the
flag set because of a previous crash. If we have to split either that
page or its right sibling, what do we do with the flags? I think that
it can be made to work, so long as searches keep moving right as long
as the flag is set. But this needs to be thought through, and
documented in the README file. I'm particularly worried whether the
after-the-fact fixup that you mention in README might fail in such
scenarios.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2010-11-16 18:04:13 Re: GCC vs clang
Previous Message Eric Davies 2010-11-16 17:31:43 Re: SQL/MED estimated time of arrival?