Re: Think I see a btree vacuuming bug

From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Think I see a btree vacuuming bug
Date: 2002-05-26 01:24:33
Message-ID: 010201c20454$17b30f50$9865fea9@Allan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Well, given that vacuum does its work in the background now - I think you'll
be hard pressed to find a sys admin who'll vote for leaving it as is, no
matter how small the chance of corruption.

However - this isn't my area of expertise...

Chris

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: <pgsql-hackers(at)postgresql(dot)org>
Sent: Saturday, May 25, 2002 11:21 AM
Subject: [HACKERS] Think I see a btree vacuuming bug

> If a VACUUM running concurrently with someone else's indexscan were to
> delete the index tuple that the indexscan is currently stopped on, then
> we'd get a failure when the indexscan resumes and tries to re-find its
> place. (This is the infamous "my bits moved right off the end of the
> world" error condition.) What is supposed to prevent that from
> happening is that the indexscan retains a buffer pin (but not a read
> lock) on the index page containing the tuple it's stopped on. VACUUM
> will not delete any tuple until it can get a "super exclusive" lock on
> the page (cf. LockBufferForCleanup), and the pin prevents it from doing
> so.
>
> However: suppose that some other activity causes the index page to be
> split while the indexscan is stopped, and that the tuple it's stopped
> on gets relocated into the new righthand page of the pair. Then the
> indexscan is holding a pin on the wrong page --- not the one its tuple
> is in. It would then be possible for the VACUUM to arrive at the tuple
> and delete it before the indexscan is resumed.
>
> This is a pretty low-probability scenario, especially given the new
> index-tuple-killing mechanism (which renders it less likely that an
> indexscan will stop on a vacuum-able tuple). But it could happen.
>
> The only solution I've thought of is to make btbulkdelete acquire
> "super exclusive" lock on *every* leaf page of the index as it scans,
> rather than only locking the pages it actually needs to delete something
> from. And we'd need to tweak _bt_restscan to chain its pins (pin the
> next page to the right before releasing pin on the previous page).
> This would prevent a btbulkdelete scan from overtaking ordinary
> indexscans, and thereby ensure that it couldn't arrive at the tuple
> on which an indexscan is stopped, even with splitting.
>
> I'm somewhat concerned that the more stringent locking will slow down
> VACUUM a good deal when there's lots of concurrent activity, but I don't
> see another answer. Ideas anyone?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2002-05-26 03:32:40 Re: Redhat 7.3 time manipulation bug
Previous Message Bruce Momjian 2002-05-25 22:43:59 Re: Temp tables are curious creatures....