Re: Fixed length data types issue

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixed length data types issue
Date: 2006-09-07 10:47:47
Message-ID: 87irk0c5rg.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:

> So you end up storing the typmod in the Datum itself, which brings you
> right back to varlena.

Not really since the Datum doesn't actually end up on disk in the case of
pass-by-reference.

which leads us to:

> Well, the root of the problem depends on your perspective. If the
> purpose behind all of this is to save disk space, perhaps the root of
> the problem is that disk representation and memory representation are
> intimately tied?

Indeed.

Consider this real table definition I found in a few moments searching for
schemas on google:

PRVDR_CTRL_TYPE_CD: CHAR(2)
PRVDR_NUM: CHAR(6)
NPI: NUMBER
RPT_STUS_CD: CHAR(1)
FY_BGN_DT: DATE
FY_END_DT: DATE
PROC_DT: DATE
INITL_RPT_SW: CHAR(1)
LAST_RPT_SW: CHAR(1)
TRNSMTL_NUM: CHAR(2)
FI_NUM: CHAR(5)
ADR_VNDR_CD: CHAR(1)
FI_CREAT_DT: DATE
UTIL_CD: CHAR(1)
NPR_DT: DATE
SPEC_IND: CHAR(1)
FI_RCPT_DT: DATE

By my count postgres would use 154 bytes for this record. Whereas in fact
there's no need for it to take more than 87 bytes. Almost 100% overhead for
varattlen headers and the padding they necessitate.

This is not a pathological example. This is a very common style of database
schema definition. Many many database tables in the real world are a 1-1
translations of existing flat file databases which have lots of short fixed
length ascii codes. Any database interacting with any old school inventory
management systems, financial databases, marketing database, etc is likely to
be of this form.

So it seems what has to happen here is we need a way of defining a data type
that has a different on-disk representation from its in-memory definition.
That means a lot more cpu overhead since I imagine it will mean pallocing the
in-memory representation before you can actually do anything with the data.

The disk reader and writer functions could probably use the typmod but it
seems what they really want to have access to is the attlen because what they
really want to know is the length of the object that their pointer refers to.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nimesh Satam 2006-09-07 10:49:01 Re: [PATCHES] Template0 age is increasing speedily.
Previous Message Chris Mair 2006-09-07 10:40:25 Re: [PATCHES] Template0 age is increasing speedily.