| From: | mark(at)mark(dot)mielke(dot)cc | 
|---|---|
| To: | Martijn van Oosterhout <kleptog(at)svana(dot)org> | 
| Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org | 
| Subject: | Re: Reducing the overhead of NUMERIC data | 
| Date: | 2005-11-03 17:28:02 | 
| Message-ID: | 20051103172802.GA28463@mark.mielke.cc | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers pgsql-patches | 
On Thu, Nov 03, 2005 at 03:09:26PM +0100, Martijn van Oosterhout wrote:
> On Thu, Nov 03, 2005 at 01:49:46PM +0000, Simon Riggs wrote:
> > In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes.
> > In PostgreSQL, they are dynamically varying datatypes.
> Please explain how a CHAR(12) can store 12 UTF-8 characters when each
> character may be 1 to 4 bytes, unless the CHAR itself is variable
> length...
> ...
> Nope, the verlena header stores the actual length on disk. If you store
> "hello" in a char(12) field it takes only 9 bytes (4 for the header, 5
> for the data), which is less than 12.
> ...
> Having a different header for things shorter than 255 bytes has been
> discussed before, that's another argument though.
It's unfortunate that the length is encoded multiple times. In UTF-8,
for instance, each character has its length encoded in the most
significant bits. Complicated to extract, however, the data is encoded
twice. 1 in the header, and 1 in the combination between the column
attribute, and the per character lengths.
For "other databases", the column could be encoded as 2 byte characters
or 4 byte characters, allowing it to be fixed. I find myself doubting
that ASCII characters could be encoded more efficiently in such formats,
than the inclusion of a length header and per character length encoding,
but for multibyte characters, the race is probably even. :-)
I dunno... no opinion on the matter here, but I did want to point out
that the field can be fixed length without a header. Those proposing such
a change, however, should accept that this may result in an overall
expense.
Cheers,
mark
-- 
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2005-11-03 17:53:01 | Re: Exclusive lock for database rename | 
| Previous Message | Alvaro Herrera | 2005-11-03 17:26:04 | Re: [COMMITTERS] pgsql: Rename the members of CommandDest enum so | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Simon Riggs | 2005-11-03 18:02:13 | Re: Reducing the overhead of NUMERIC data | 
| Previous Message | Andrew Dunstan | 2005-11-03 16:36:45 | Re: Reducing the overhead of NUMERIC data |