Re: Documentation of bt_page_items()'s ctid field

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Documentation of bt_page_items()'s ctid field
Date: 2014-12-30 20:07:34
Message-ID: CAM3SWZTyRMDgHyikJu_Dsx046jRzoTR94YRd+iB5R7tq1hv2aQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 30, 2014 at 8:59 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> How much detail on the b-tree internals do we want to put in the pageinspect
> documentation? I can see that being useful, but should we also explain e.g.
> that the first item on each (non-rightmost) page is the high key?

Maybe we should. I see no reason not to, and I think that it makes
sense to explain things at that level without going into flags and so
on. But don't forget that that isn't quite the full story if we're
going to talk about high keys at all; we must also explain "minus
infinity" keys, alongside any explanation of the high key:

* CRUCIAL NOTE: on a non-leaf page, the first data key is assumed to be
* "minus infinity": this routine will always claim it is less than the
* scankey. The actual key value stored (if any, which there probably isn't)
* does not matter. This convention allows us to implement the Lehman and
* Yao convention that the first down-link pointer is before the first key.
* See backend/access/nbtree/README for details.

In particular, this means that the key data is garbage, which is
something I've also seen causing confusion [1].

I would like to make it easier for competent non-experts on the B-Tree
code to eyeball a B-Tree with pageinspect, and be reasonably confident
that things add up. In order for such people to know that something is
wrong, we should explain what "right" looks like in moderate detail.
So, as I said, I feel an exact explanation of flags is unnecessary,
but tend to agree that a brief reference to both page highkeys and
"minus infinity" keys is appropriate, since users of the function will
see them all the time.

> I had a hard time understanding the remark about the root page. But in any
> case, if you look at the flags set e.g. with bt_page_stats(), the root page
> is flagged as also being a leaf page, when it is the only page in the index.
> So the root page is considered also a leaf page in that case.

I think that a better way of handling that originally would have been
to make root-ness a separate property from leaf-ness/internal-ness.
Too late for that now, I suppose.

> I'd suggest saying the same thing (or more) with fewer words:
>
> In a B-tree leaf page, <structfield>ctid</> points to a heap tuple. In an
> internal page, it points to another page in the index itself, and the offset
> number part (the second number) of the ctid field is ignored.

That seems good. What do you think of the attached revision?

[1] http://www.postgresql.org/message-id/20140828.110824.1195843073079055852.t-ishii@sraoss.co.jp
--
Peter Geoghegan

Attachment Content-Type Size
bt_page_items_ctid.patch text/x-patch 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-12-30 20:21:25 Re: Documentation of bt_page_items()'s ctid field
Previous Message Jeff Janes 2014-12-30 17:45:57 Re: Maximum number of WAL files in the pg_xlog directory