Think I see a btree vacuuming bug

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Think I see a btree vacuuming bug
Date: 2002-05-25 18:21:52
Message-ID: 27838.1022350912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

If a VACUUM running concurrently with someone else's indexscan were to
delete the index tuple that the indexscan is currently stopped on, then
we'd get a failure when the indexscan resumes and tries to re-find its
place. (This is the infamous "my bits moved right off the end of the
world" error condition.) What is supposed to prevent that from
happening is that the indexscan retains a buffer pin (but not a read
lock) on the index page containing the tuple it's stopped on. VACUUM
will not delete any tuple until it can get a "super exclusive" lock on
the page (cf. LockBufferForCleanup), and the pin prevents it from doing
so.

However: suppose that some other activity causes the index page to be
split while the indexscan is stopped, and that the tuple it's stopped
on gets relocated into the new righthand page of the pair. Then the
indexscan is holding a pin on the wrong page --- not the one its tuple
is in. It would then be possible for the VACUUM to arrive at the tuple
and delete it before the indexscan is resumed.

This is a pretty low-probability scenario, especially given the new
index-tuple-killing mechanism (which renders it less likely that an
indexscan will stop on a vacuum-able tuple). But it could happen.

The only solution I've thought of is to make btbulkdelete acquire
"super exclusive" lock on *every* leaf page of the index as it scans,
rather than only locking the pages it actually needs to delete something
from. And we'd need to tweak _bt_restscan to chain its pins (pin the
next page to the right before releasing pin on the previous page).
This would prevent a btbulkdelete scan from overtaking ordinary
indexscans, and thereby ensure that it couldn't arrive at the tuple
on which an indexscan is stopped, even with splitting.

I'm somewhat concerned that the more stringent locking will slow down
VACUUM a good deal when there's lots of concurrent activity, but I don't
see another answer. Ideas anyone?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2002-05-25 19:55:01 Re: Redhat 7.3 time manipulation bug
Previous Message Tom Lane 2002-05-25 17:28:29 Re: strange update problem with 7.2.1