Re: Think I see a btree vacuuming bug

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Think I see a btree vacuuming bug
Date: 2002-08-26 20:25:55
Message-ID: 2521.1030393555@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Is this fixed, and if not, can I have some TODO text?

It's not fixed. I'd like to fix it for 7.3, but I was hoping someone
would think of a better way to fix it than I did ...

regards, tom lane

> ---------------------------------------------------------------------------

> Tom Lane wrote:
>> If a VACUUM running concurrently with someone else's indexscan were to
>> delete the index tuple that the indexscan is currently stopped on, then
>> we'd get a failure when the indexscan resumes and tries to re-find its
>> place. (This is the infamous "my bits moved right off the end of the
>> world" error condition.) What is supposed to prevent that from
>> happening is that the indexscan retains a buffer pin (but not a read
>> lock) on the index page containing the tuple it's stopped on. VACUUM
>> will not delete any tuple until it can get a "super exclusive" lock on
>> the page (cf. LockBufferForCleanup), and the pin prevents it from doing
>> so.
>>
>> However: suppose that some other activity causes the index page to be
>> split while the indexscan is stopped, and that the tuple it's stopped
>> on gets relocated into the new righthand page of the pair. Then the
>> indexscan is holding a pin on the wrong page --- not the one its tuple
>> is in. It would then be possible for the VACUUM to arrive at the tuple
>> and delete it before the indexscan is resumed.
>>
>> This is a pretty low-probability scenario, especially given the new
>> index-tuple-killing mechanism (which renders it less likely that an
>> indexscan will stop on a vacuum-able tuple). But it could happen.
>>
>> The only solution I've thought of is to make btbulkdelete acquire
>> "super exclusive" lock on *every* leaf page of the index as it scans,
>> rather than only locking the pages it actually needs to delete something
>> from. And we'd need to tweak _bt_restscan to chain its pins (pin the
>> next page to the right before releasing pin on the previous page).
>> This would prevent a btbulkdelete scan from overtaking ordinary
>> indexscans, and thereby ensure that it couldn't arrive at the tuple
>> on which an indexscan is stopped, even with splitting.
>>
>> I'm somewhat concerned that the more stringent locking will slow down
>> VACUUM a good deal when there's lots of concurrent activity, but I don't
>> see another answer. Ideas anyone?
>>
>> regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2002-08-26 20:49:07 Re: [HACKERS] pg_attribute.attisinherited ?
Previous Message Bruce Momjian 2002-08-26 20:14:40 Re: Think I see a btree vacuuming bug