Placing hints in line pointers

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Placing hints in line pointers
Date: 2013-06-01 14:45:02
Message-ID: CA+U5nMLzCXuK-hix4OJXFMeu--0W8=vWtLW-U8boOncZ=LMzdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Notes on a longer term idea...

An item pointer (also called line pointer) is used to allow an
external pointer to an item, while allowing us to place the tuple that
anywhere on the page. An ItemId is 4 bytes long and currently consists
of (see src/include/storage/itemid.h)...

typedef struct ItemIdData
{
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of item pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;

The offset to the tuple is 15 bits, which is sufficient to point to
32768 separate byte positions, and hence why we limit ourselves to
32kB blocks.

If we use 4 byte alignment for tuples, then that would mean we
wouldn't ever use the lower 2 bits of lp_off, nor would we use the
lower 2 bits of lp_len. They are always set at zero. (Obviously, with
8 byte alignment we would have 3 bits spare in each, but I'm looking
for something that works the same on various architectures for
simplicity).

So my suggestion is to make lp_off and lp_len store the values in
terms of 4 byte chunks, which would allow us to rework the data
structure like this...

typedef struct ItemIdData
{
unsigned lp_off:13, /* offset to tuple (from start of
page), number of 4 byte chunks */
lp_xmin_hint:2, /* committed and invalid hints for xmin */
lp_flags:2, /* state of item pointer, see below */
lp_len:13; /* byte length of tuple, number of 4
byte chunks */
lp_xmax_hint:2, /*committed and invalid hints for xmax */
} ItemIdData;

i.e. we have room for 4 additional bits and we use those to put the
tuple hints for xmin and xmax

Doing this would have two purposes:

* We wouldn't need to follow the pointer if the row is marked aborted.
This would save a random memory access for that tuple

* It would isolate the tuple hint values into a smaller area of the
block, so we would be able to avoid the annoyance of recalculating the
checksums for the whole block when a single bit changes.

We wouldn't need to do a FPW when a hint changes, we would only need
to take a copy of the ItemId array, which is much smaller. And it
could be protected by its own checksum.

(In addition, if we wanted, this could be used to extend block size to
64kB if we used 8-byte alignment for tuples)

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-06-01 14:45:25 Re: detecting binary backup in progress
Previous Message Thom Brown 2013-06-01 14:32:31 Re: pgsql: Minor spelling fixes