GiST insert algorithm rewrite

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: GiST insert algorithm rewrite
Date: 2010-11-16 16:33:22
Message-ID: 4CE2B252.4060409@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Following the discussions here
(http://archives.postgresql.org/message-id/12375.1289429390@sss.pgh.pa.us),
here's a draft version of a rewrite of the GiST insertion algorithm and
WAL-logging. The goal is to get rid of the tracking of incomplete splits
and inserts in WAL replay, and the fixup activities at the end of recovery.

There are two key changes to the insert algorithm:

1. When we walk down the tree searching for a suitable leaf page to
insert the new tuple to, we update the nodes on the way down so that
they are consistent with the new key we're inserting. The old GiST
algorithm adjusted all the parent pages only after inserting the tuple
on the leaf. Updating them on the way down ensures that the tree is
self-consistent at all times, even if we crash just after inserting the
new tuple on the leaf page.

2. When a page is split, we mark the new left page with a flag to
indicate that the downlink for the page to the right hasn't been
inserted yet. When the downlink is inserted, the flag is cleared. Again
the purpose is to ensure that the tree is self-consistent at all times.
If we crash just after a page split, before the downlink is inserted,
scans will find the tuples on the right page by following the rightlink.
It's slightly less performant, but correct.

The WAL replay code has been simplified accordingly, we no longer need
any fixup code to finish incomplete splits or inserts, and we no longer
need the "invalid" pointers in the tree.

There's no on-disk format changes, except for the additional flag in the
page headers, so this does not affect pg_upgrade. However, if there's
any "invalid" keys in the old index because of an incomplete insertion,
the new code will not understand that. So you should run vacuum to
ensure that there's no such invalid keys in the index before upgrading.
Vacuum will print a message in the log if it finds any, and you will
have to reindex. But that's what it suggests you to do anyway.

I haven't done much testing yet, although it passes the regression
tests. I'll do more testing, and crash testing, and I'll have to
re-review the patch myself before committing.

Comments?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
gist-insert-rewrite-1.patch text/x-diff 103.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2010-11-16 16:41:08 Re: Per-column collation
Previous Message Itagaki Takahiro 2010-11-16 16:32:28 Re: autovacuum maintenance_work_mem