Re: Documentation of bt_page_items()'s ctid field

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Documentation of bt_page_items()'s ctid field
Date: 2014-12-30 20:21:25
Message-ID: 54A30945.60306@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/30/2014 10:07 PM, Peter Geoghegan wrote:
> On Tue, Dec 30, 2014 at 8:59 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> How much detail on the b-tree internals do we want to put in the pageinspect
>> documentation? I can see that being useful, but should we also explain e.g.
>> that the first item on each (non-rightmost) page is the high key?
>
> Maybe we should. I see no reason not to, and I think that it makes
> sense to explain things at that level without going into flags and so
> on. But don't forget that that isn't quite the full story if we're
> going to talk about high keys at all; we must also explain "minus
> infinity" keys, alongside any explanation of the high key:

Yeah, good point.

> * CRUCIAL NOTE: on a non-leaf page, the first data key is assumed to be
> * "minus infinity": this routine will always claim it is less than the
> * scankey. The actual key value stored (if any, which there probably isn't)
> * does not matter. This convention allows us to implement the Lehman and
> * Yao convention that the first down-link pointer is before the first key.
> * See backend/access/nbtree/README for details.
>
> In particular, this means that the key data is garbage, which is
> something I've also seen causing confusion [1].

In practice, we never store any actual key value for the "minus
infinity" key. I guess the code would ignore it if it was there, but it
would make more sense to explain that the first data key on an internal
page does not have a key value. If there is a value there, it's a sign
that something's wrong.

> I would like to make it easier for competent non-experts on the B-Tree
> code to eyeball a B-Tree with pageinspect, and be reasonably confident
> that things add up. In order for such people to know that something is
> wrong, we should explain what "right" looks like in moderate detail.

Makes sense.

>> I had a hard time understanding the remark about the root page. But in any
>> case, if you look at the flags set e.g. with bt_page_stats(), the root page
>> is flagged as also being a leaf page, when it is the only page in the index.
>> So the root page is considered also a leaf page in that case.
>
> I think that a better way of handling that originally would have been
> to make root-ness a separate property from leaf-ness/internal-ness.

Hmm, yeah, bt_page_stats() currently returns 'l' in the type column when
(BTP_ROOT | BTP_LEAF).

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-12-30 20:43:06 pgsql: pg_event_trigger_dropped_objects: Add name/args output columns
Previous Message Peter Geoghegan 2014-12-30 20:07:34 Re: Documentation of bt_page_items()'s ctid field