Tom Lane wrote:
> In theory, given a slow-enough-moving VACUUM process, this could happen
> even without a crash. So I think that means we have to go over to the
> other plan of locking everything all the way up to the top of the
> deletion before we start doing it --- and also, we'll need crash
> recovery logic to complete the incomplete deletion, same as for
> incomplete inserts. Yech --- more code than I was hoping to have to
> write, and more concurrency hit too.
ISTM we don't necessarily need the "complete the incomplete deletion"
logic. Since we're holding all the pages up to the parent of the highest
deleted page locked, we can atomically issue one big WAL record covering
all the modifications. We can't do that with page splits, because we
want to release the lock on the split pages before we move up to the
parent to insert the child pointer. (whether or not it's worth it in
page splits either is another issue)
The concurrency hit probably isn't that bad. There shouldn't be much
activity on empty pages.
What does the original research paper by Lanin & Shasha say about this?
I don't have it at hand. I do have the Lehman & Yao paper but that one
doesn't discuss removal of non-leaf pages at all.
> OTOH we might be able to get rid of the notion of "half dead", which
> would allow some simplifications.
> Thanks for taking a close look!
No problem, I've been staring at the b-tree code lately anyway.
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2006-10-26 16:31:50|
|Subject: Re: Nasty btree deletion bug |
|Previous:||From: Bruce Momjian||Date: 2006-10-26 16:21:57|
|Subject: Re: [HACKERS] Replication documentation addition|