From: | Martijn van Oosterhout <kleptog(at)svana(dot)org> |
---|---|
To: | mark(at)mark(dot)mielke(dot)cc |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Reducing the overhead of NUMERIC data |
Date: | 2005-11-03 18:18:55 |
Message-ID: | 20051103181854.GE15795@svana.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
On Thu, Nov 03, 2005 at 12:28:02PM -0500, mark(at)mark(dot)mielke(dot)cc wrote:
> It's unfortunate that the length is encoded multiple times. In UTF-8,
> for instance, each character has its length encoded in the most
> significant bits. Complicated to extract, however, the data is encoded
> twice. 1 in the header, and 1 in the combination between the column
> attribute, and the per character lengths.
>
> For "other databases", the column could be encoded as 2 byte characters
> or 4 byte characters, allowing it to be fixed. I find myself doubting
> that ASCII characters could be encoded more efficiently in such formats,
> than the inclusion of a length header and per character length encoding,
> but for multibyte characters, the race is probably even. :-)
That's called UTF-16 and is currently not supported by PostgreSQL at
all. That may change, since the locale library ICU requires UTF-16 for
everything.
The question is, if someone declares a field CHAR(20), do they really
mean to fix 40 bytes of storage for each and every row? I doubt it,
that's even more wasteful of space than a varlena header.
Which puts you right back to variable length fields.
> I dunno... no opinion on the matter here, but I did want to point out
> that the field can be fixed length without a header. Those proposing such
> a change, however, should accept that this may result in an overall
> expense.
The only time this may be useful is for *very* short fields, in the
order of 4 characters or less. Else the overhead swamps the varlena
header...
Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2005-11-03 18:29:09 | Re: Spinlocks, yet again: analysis and proposed patches |
Previous Message | Simon Riggs | 2005-11-03 18:02:13 | Re: Reducing the overhead of NUMERIC data |
From | Date | Subject | |
---|---|---|---|
Next Message | Gregory Maxwell | 2005-11-03 19:06:02 | Re: Reducing the overhead of NUMERIC data |
Previous Message | Simon Riggs | 2005-11-03 18:02:13 | Re: Reducing the overhead of NUMERIC data |