BTree metapage lock and freelist structure

From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: BTree metapage lock and freelist structure
Date: 2002-10-07 17:13:03
Message-ID: 20021007171303.GB4548@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello hackers,

I'm thinking about the btree metapage locking problem.

In the current situation there are three operations that lock the
metapage:
1. when looking for the root page (share lock, "getroot")
2. when setting a new root page (exclusive lock, "setroot")
3. when extending the relation (exclusive lock, "extend").

Now, I want to add three more operations:
4. add a page to the freelist ("addfree")
5. get a page from the freelist ("getfree")
6. shrink a relation ("shrink").

The shrink operation only happens when one tries to add the last page of
the relation to the freelist. I don't know if that's very common, but
in case of relation truncating or massive deletion this is important.

What I want is to be able to do getroot and setroot without being
blocked by any of the other four. Of course the other four are all
mutually exclusive.

There doesn't seem to be a way to acquire two different locks on the
same page, so I propose to lock the InvalidBlockNumer for the btree, and
use that as the lock to be obtained before doing extend, addfree,
getfree or shrink. The setroot/getroot operations would still use the
locking on BTREE_METAPAGE, so they can proceed whether the
InvalidBlockNumber is blocked or not.

On a different topic: the freelist structure I think should be
represented as a freelist int32 number (btm_numfreepages) in
BTMetaPageData, and a pointer to the first BlockNumber. Adding a new
page is done by sequentially scanning the list until a zero is found or
the end of the block is reached. Getting a page sequentially scans the
same list until a blocknumber > 0 is found (iff btm_numfreepages is
greater than zero). This allows for ~ 2000 free pages (very unlikely to
actually happen if the shrink operation is in place).

Comments? Another solution would be to have a separate page for the
freelist, but it seems to be a waste.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Porque francamente, si para saber manejarse a uno mismo hubiera que
rendir examen... ¿Quién es el machito que tendría carnet?" (Mafalda)

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sandeep Chadha 2002-10-07 17:21:06 Hot Backup
Previous Message Michael Paesold 2002-10-07 17:12:14 Re: Table spaces again [was Re: Threaded Sorting]