Quick Links

Re: The "many nulls" problem

From:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To:	"Kynn Jones" <kynnjo(at)gmail(dot)com>
Cc:	<pgsql-performance(at)postgresql(dot)org>
Subject:	Re: The "many nulls" problem
Date:	2008-03-14 19:46:00
Message-ID:	47DAD5F8.6060804@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Kynn Jones wrote:
> In all these cases, the design choice, at least according to RDb's 101, is
> between including a column in the table that will be NULL most of the time,
> or defining a second auxiliary column that references the first one and
> holds the non-redundant information for the minority of cases for which this
> is necessary (and maybe define a VIEW that includes all the columns).
>
> But for me it is a frequent occurrence that my quaint and simple RDb's 101
> reasoning doesn't really apply for PostgreSQL. Basically, Pg is too smart
> for it! For example, does a large proportion of NULLs really imply a lot of
> wasted space?

It depends. If there's *any* NULLs on a row, a bitmap of the NULLs is
stored in the tuple header. Without NULL bitmap, the tuple header is 23
bytes, and due to memory alignment, it's always rounded up to 24 bytes.
That one padding byte is "free" for use as NULL bitmap, so it happens
that if your table has eight columns or less, NULLs will take no space
at all. If you have more columns than that, if there's *any* NULLs on a
row you'll waste a whole 4 or 8 bytes (or more if you have a very wide
table and go beyond the next 4/8 byte boundary), depending on whether
you're on a 32-bit or 64-bit platform, regardless of how many NULLs
there is.

That's on 8.3. 8.2 and earlier versions are similar, but the tuple
header used to be 27 bytes instead of 23, so you have either one or five
"free" bytes, depending on architecture.

In any case, that's pretty good compared to many other RDBMSs.

> Maybe this is true for fixed-length data types, but what
> about for type TEXT or VARCHAR?

Datatype doesn't make any difference. Neither does fixed vs variable length.

> What's your schema design approach for such situations? How would you go
> about deciding whether the number of exceptional cases is small enough to
> warrant a second table? Of course, one could do a systematic profiling of
> various possible scenarios, but as a first approximation what's your
> rule-of-thumb?

From performance point of view, I would go with a single table with
NULL fields on PostgreSQL.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

The "many nulls" problem at 2008-03-14 18:05:50 from Kynn Jones

Responses

Re: The "many nulls" problem at 2008-03-14 20:05:20 from Kynn Jones

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Kynn Jones	2008-03-14 20:05:20	Re: The "many nulls" problem
Previous Message	Scott Marlowe	2008-03-14 19:43:33	Re: Hardware question for a DB server