Bizarre coding in _bt_binsrch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Bizarre coding in _bt_binsrch
Date: 1999-06-06 00:40:50
Message-ID: 8552.928629650@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have been puzzling out the coding in _bt_binsrch() in
backend/access/nbtree/nbtsearch.c, with an eye to speeding it up for
the many-equal-keys case. I have finally worked out exactly what it's
doing, to wit:

* On a leaf page, we always return the first key >= scan key
* (which could be the last slot + 1).
*
* On a non-leaf page, there are special cases:
*
* For an insertion (srchtype != BT_DESCENT and natts == keysz)
* always return first key >= scan key (which could be off the end).
*
* For a standard search (srchtype == BT_DESCENT and natts == keysz)
* return the first equal key if one exists, else the last lesser key
* if one exists, else the first slot on the page.
*
* For a partial-match search (srchtype == BT_DESCENT and natts < keysz)
* return the last lesser key if one exists, else the first slot.

This strikes me as a tad bizarre --- in particular, the discrepancy
between treatment of equal keys in the normal and partial search cases.

I think I understand why the partial-match code works that way: there
could be matching keys in the sub-page belonging to the last lesser key.
For example, if our scan target is (x = 2) and we have internal keys
(x = 1, y = 2)
(x = 2, y = 1)
then we need to look at the first key's subpages, where we might find
matching keys like (x = 2, y = 0).

The full-width case appears to assume that that can't happen: if we
have a given key in an upper page, there can be *no* equal keys in
subpages to its left. That's a rather strong assumption about how
page splitting is done; is it correct?

Even more to the point, *should* it be correct? If we always returned
the last lesser key, then the code would work for any correctly
sequenced b-tree, but this code looks like it works only if splits occur
only at the leftmost of a group of equal keys. If there are a lot of
equal keys, that could result in a badly unbalanced tree, no? Maybe
that's the real reason why performance seems to be so bad for many
equal keys... maybe the split algorithm needs to be relaxed?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Don Baccus 1999-06-06 02:19:29 Re: [HACKERS] Priorities for 6.6
Previous Message Tom Lane 1999-06-05 20:58:23 Re: [HACKERS] Priorities for 6.6