Re: [HACKERS] Bizarre coding in _bt_binsrch

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Bizarre coding in _bt_binsrch
Date: 1999-11-29 22:24:37
Message-ID: 199911292224.RAA15252@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tom, I assume you have dealt with this, right?

> I have been puzzling out the coding in _bt_binsrch() in
> backend/access/nbtree/nbtsearch.c, with an eye to speeding it up for
> the many-equal-keys case. I have finally worked out exactly what it's
> doing, to wit:
>
> * On a leaf page, we always return the first key >= scan key
> * (which could be the last slot + 1).
> *
> * On a non-leaf page, there are special cases:
> *
> * For an insertion (srchtype != BT_DESCENT and natts == keysz)
> * always return first key >= scan key (which could be off the end).
> *
> * For a standard search (srchtype == BT_DESCENT and natts == keysz)
> * return the first equal key if one exists, else the last lesser key
> * if one exists, else the first slot on the page.
> *
> * For a partial-match search (srchtype == BT_DESCENT and natts < keysz)
> * return the last lesser key if one exists, else the first slot.
>
> This strikes me as a tad bizarre --- in particular, the discrepancy
> between treatment of equal keys in the normal and partial search cases.
>
> I think I understand why the partial-match code works that way: there
> could be matching keys in the sub-page belonging to the last lesser key.
> For example, if our scan target is (x = 2) and we have internal keys
> (x = 1, y = 2)
> (x = 2, y = 1)
> then we need to look at the first key's subpages, where we might find
> matching keys like (x = 2, y = 0).
>
> The full-width case appears to assume that that can't happen: if we
> have a given key in an upper page, there can be *no* equal keys in
> subpages to its left. That's a rather strong assumption about how
> page splitting is done; is it correct?
>
> Even more to the point, *should* it be correct? If we always returned
> the last lesser key, then the code would work for any correctly
> sequenced b-tree, but this code looks like it works only if splits occur
> only at the leftmost of a group of equal keys. If there are a lot of
> equal keys, that could result in a badly unbalanced tree, no? Maybe
> that's the real reason why performance seems to be so bad for many
> equal keys... maybe the split algorithm needs to be relaxed?
>
> regards, tom lane
>
>

--
Bruce Momjian | http://www.op.net/~candle
maillist(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1999-11-29 22:25:02 Re: [HACKERS] destroydb doesn't close connection with client (httpd <-> pg)
Previous Message The Hermit Hacker 1999-11-29 20:55:03 Re: [HACKERS] VACUUM as a denial-of-service attack