Re: Reducing the overhead of NUMERIC data

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing the overhead of NUMERIC data
Date: 2005-11-03 14:09:26
Message-ID: 20051103140926.GC15795@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Thu, Nov 03, 2005 at 01:49:46PM +0000, Simon Riggs wrote:
> In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes.
> In PostgreSQL, they are dynamically varying datatypes.

Please explain how a CHAR(12) can store 12 UTF-8 characters when each
character may be 1 to 4 bytes, unless the CHAR itself is variable
length...

> What actually happens is that in many other systems the datatype is the
> same, but additional metadata is provided for that particular attribute.
> So CHAR(12) is a datatype of CHAR with a metadata item called length
> which is set to 12 for that attribute.

We already have this metadata, it's called atttypmod and it's stored in
pg_attribute. That's where the 12 for CHAR(12) is stored BTW.

> On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
> that datatype having a 4 byte varlena header. In this example, all of
> those instantiations having the varlena header set to 12, so essentially
> wasting the 4 byte header.

Nope, the verlena header stores the actual length on disk. If you store
"hello" in a char(12) field it takes only 9 bytes (4 for the header, 5
for the data), which is less than 12.

Good ideas, but it all hinges on the fact that CHAR(12) can take a
fixed amount of space, which simply isn't true in a multibyte encoding.

Having a different header for things shorter than 255 bytes has been
discussed before, that's another argument though.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2005-11-03 14:13:58 Re: Reducing the overhead of NUMERIC data
Previous Message Simon Riggs 2005-11-03 13:49:46 Re: Reducing the overhead of NUMERIC data

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2005-11-03 14:13:58 Re: Reducing the overhead of NUMERIC data
Previous Message Simon Riggs 2005-11-03 13:49:46 Re: Reducing the overhead of NUMERIC data