Question regarding the database page layout.

From: "Ryan Bradetich" <rbradetich(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Question regarding the database page layout.
Date: 2008-08-11 08:07:00
Message-ID: e739902b0808110107j76320e95m1babda71833b4ff3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all,

I have been digging into the database page layout (specifically the tuples)
to ensure the unsigned integer types were consuming the proper storage.
While digging around, I found one thing surprising:

It appears the heap tuples are padded at the end to the MAXALIGN distance.

Below is my data that I used to come to this conclusion.
(This test was performed on a 64-bit system with --with-blocksize=32).

The goal was to compare data from comparable type sizes.
The first column indicates the type (char, uint1, int2, uint2, int4, and
uint4),
the number in () indicates the number of columns in the table.

The Length is from the .lp_off field in the ItemId structure.
The Offset is from the .lp_len field in the ItemId structure.
The Size is the offset difference.

char (1) Length Offset Size char (9)
Length Offset Size
25 32736
32 33 32728 40
25 32704
32 33 32688 40
25 32672
32 33 32648 40
25
32640 33 32608

uint1 (1) Length Offset Size uint1 (9)
Length Offset Size
25 32736
32 33 32728 40
25 32704
32 33 32688 40
25 32672
32 33 32648 40
25 32640
33 32608

int2 (1) Length Offset Size int2 (5)
Length Offset Size
26 32736
32 34 32728 40
26 32704
32 34 32688 40
26 32672
32 34 32648 40
26
32640 34 32608

uint2 (1) Length Offset Size unt2 (5)
Length Offset Size
26 32736
32 34 32728 40
26 32704
32 34 32688 40
26 32672
32 34 32648 40
26
32640 34 32608

int4 (1) Length Offset Size int4 (3)
Length Offset Size
28 32736
32 36 32728 40
28 32704
32 36 32688 40
28 32672
32 36 32648 40
28
32640 36 32608

uint4 (1) Length Offset Size uint4 (3)
Length Offset Size
28 32736
32 36 32728 40
28 32704
32 36 32688 40
28 32672
32 36 32648 40
28
32640 36 32608

From the documentation at:
http://www.postgresql.org/docs/8.3/static/storage-page-layout.html
and from the comments in src/include/access/htup.h I understand the user
data (indicated by t_hoff)
must by a multiple of MAXALIGN distance, but I did not find anything
suggesting the heap tuple itself
had this requirement.

After a cursory glance at the HeapTupleHeaderData structure, it appears it
could be aligned with
INTALIGN instead of MAXALIGN. The one structure I was worried about was the
6 byte t_ctid
structure. The comments in src/include/storage/itemptr.h file indicate the
ItemPointerData structure
is composed of 3 int16 fields. So everthing in the HeapTupleHeaderData
structure is 32-bits or less.

I am interested in attempting to generate a patch if this idea appears
feasible. The current data
set I am playing with it would save over 3GB of disk space. (Back of the
envelope calculations
indicate that 5% of my current storage is consumed by this padding. My
tuple length is 44 bytes.)

Thanks,

- Ryan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-08-11 08:40:40 Re: gsoc, oprrest function for text search take 2
Previous Message Heikki Linnakangas 2008-08-11 07:46:47 Re: Proposal: PageLayout footprint