Re: Reducing the overhead of NUMERIC data

From: mark(at)mark(dot)mielke(dot)cc
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Maxwell <gmaxwell(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing the overhead of NUMERIC data
Date: 2005-11-04 13:38:38
Message-ID: 20051104133838.GA2021@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On Thu, Nov 03, 2005 at 09:17:43PM -0500, Tom Lane wrote:
> Gregory Maxwell <gmaxwell(at)gmail(dot)com> writes:
> > Another way to look at this is in the context of compression: With
> > unicode, characters are really 32bit values... But only a small range
> > of these values is common. So we store and work with them in a
> > compressed format, UTF-8.
> > As such it might be more interesting to ask some other questions like:
> > are we using the best compression algorithm for the application, and,
> > why do we sometimes stack two compression algorithms?
> Actually, the real reason we use UTF-8 and not any of the
> sorta-fixed-size representations of Unicode is that the backend is by
> and large an ASCII, null-terminated-string engine. *All* of the
> supported backend encodings are ASCII-superset codes. Making
> everything null-safe in order to allow use of UCS2 or UCS4 would be
> a huge amount of work, and the benefit is at best questionable.

Perhaps on a side note - my intuition (which sometimes lies) would tell
me that, if the above is true, the backend is doing unnecessary copies
of read-only data, if only, to insert a '\0' at the end of the strings.
Is this true?

I'm thinking along the lines of the other threads that speak of PostgreSQL
being CPU or I/O bound, not disk bound, for many sorts of operations. Is
PostgreSQL unnecessary copying string data around (and other data, I would
assume).

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-11-04 15:13:29 Re: Reducing the overhead of NUMERIC data
Previous Message Cedric Berger 2005-11-04 08:18:12 postgresql-8.1RC1 on Solaris 10, amd64x2

Browse pgsql-patches by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-11-04 15:13:29 Re: Reducing the overhead of NUMERIC data
Previous Message Tom Lane 2005-11-04 02:17:43 Re: Reducing the overhead of NUMERIC data