Re: Unsplitting btree index leaf pages

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unsplitting btree index leaf pages
Date: 2005-12-22 15:59:16
Message-ID: 20051222155915.GG21783@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 22, 2005 at 10:40:24AM -0500, Tom Lane wrote:
> And you evidently still didn't understand it. Locking both pages does
> not fix the problem, because it doesn't guarantee that there's not a
> concurrent indexscan in flight from one to the other. If you move items
> from one page to the other in the opposite direction from the way the
> scan is going, then it will miss those items. If we try to fix this by
> making scans lock one page before releasing the previous, then we'll
> create a bunch of deadlock cases.

I've been thinking, maybe there's a lockless way of going about this?
Have some kind of index modification identifier that you set at the
beginning of the index scan. What you do is mark the tuples you want to
move with and IMI (I'm trying to avoid the word transaction here) and then
copy the tuples to the new page with IMI+1. Any currently running index
scan will notice the higher IMI and ignore them. When all old index
scans are done, you can remove the old ones.

It's sort of a mini-MVCC for indexes except you could probably just use
a few states: visible to all, visible to current scan, invisible to
current scan. Or use two bits to represent frozen, 1, 2 and 3. A plain
VACUUM could do the state transistions each time it moves through the
index.

The downsides are probably that it's a pain to make it work
concurrently and requires writing each index page at least twice. But
it's a thought...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-12-22 17:00:24 Re: PL/pgSQL proposal: using list of scalars in assign stmts, fore and fors stmts
Previous Message Tom Lane 2005-12-22 15:40:24 Re: Unsplitting btree index leaf pages