Skip site navigation (1) Skip section navigation (2)

Re: Resurrecting per-page cleaner for btree

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, teramoto(dot)junji(at)lab(dot)ntt(dot)co(dot)jp
Subject: Re: Resurrecting per-page cleaner for btree
Date: 2006-07-25 18:02:31
Message-ID: 18371.1153850551@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> I think we can resurrect his idea because we will scan btree pages
> at-atime now; the missing-restarting-point problem went away.

> Have I missed something? Comments welcome.

I was thinking for awhile just now that this would break the interlock
that guarantees VACUUM can't delete a heap tuple that an indexscanning
process is about to visit.  After further thought, it doesn't, but it's
non-obvious.  I've added the attached commentary to nbtree/README:


On-the-fly deletion of index tuples
-----------------------------------

If a process visits a heap tuple and finds that it's dead and removable
(ie, dead to all open transactions, not only that process), then we can
return to the index and mark the corresponding index entry "known dead",
allowing subsequent index scans to skip visiting the heap tuple.  The
"known dead" marking uses the LP_DELETE bit in ItemIds.  This is currently
only done in plain indexscans, not bitmap scans, because only plain scans
visit the heap and index "in sync" and so there's not a convenient way
to do it for bitmap scans.

Once an index tuple has been marked LP_DELETE it can actually be removed
from the index immediately; since index scans only stop "between" pages,
no scan can lose its place from such a deletion.  We separate the steps
because we allow LP_DELETE to be set with only a share lock (it's exactly
like a hint bit for a heap tuple), but physically removing tuples requires
exclusive lock.  In the current code we try to remove LP_DELETE tuples when
we are otherwise faced with having to split a page to do an insertion (and
hence have exclusive lock on it already).

This leaves the index in a state where it has no entry for a dead tuple
that still exists in the heap.  This is not a problem for the current
implementation of VACUUM, but it could be a problem for anything that
explicitly tries to find index entries for dead tuples.  (However, the
same situation is created by REINDEX, since it doesn't enter dead
tuples into the index.)

It's sufficient to have an exclusive lock on the index page, not a
super-exclusive lock, to do deletion of LP_DELETE items.  It might seem
that this breaks the interlock between VACUUM and indexscans, but that is
not so: as long as an indexscanning process has a pin on the page where
the index item used to be, VACUUM cannot complete its btbulkdelete scan
and so cannot remove the heap tuple.  This is another reason why
btbulkdelete has to get super-exclusive lock on every leaf page, not only
the ones where it actually sees items to delete.


			regards, tom lane

In response to

pgsql-hackers by date

Next:From: Bort, PaulDate: 2006-07-25 18:15:57
Subject: Re: Adding a pgbench run to buildfarm
Previous:From: Jim C. NasbyDate: 2006-07-25 17:59:59
Subject: Re: Adding a pgbench run to buildfarm

pgsql-patches by date

Next:From: Tom LaneDate: 2006-07-25 19:37:58
Subject: Re: Resurrecting per-page cleaner for btree
Previous:From: Hiroshi SaitoDate: 2006-07-25 17:12:31
Subject: Re: [HACKERS] Patch for VS.Net 2005's strxfrm() bug

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group