Re: Fixed length data types issue

From: mark(at)mark(dot)mielke(dot)cc
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <gsstark(at)mit(dot)edu>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixed length data types issue
Date: 2006-09-08 20:31:23
Message-ID: 20060908203123.GA16397@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 08, 2006 at 02:39:03PM -0400, Alvaro Herrera wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> > I think I've been involved in a discussion like this in the past. Was
> > it mentioned in this list before? Yes the UTF-8 vs UTF-16 encoding
> > means that UTF-8 applications are at a disadvantage when using the
> > library. UTF-16 is considered more efficient to work with for everybody
> > except ASCII users. :-)
> Uh, is it? By whom? And why?

The authors of the library in question? Java? Anybody whose primary
alphabet isn't LATIN1 based? :-)

Only ASCII values store more space efficiently in UTF-8. All values
over 127 store more space efficiently using UTF-16. UTF-16 is easier
to process. UTF-8 requires too many bit checks with single character
offsets. I'm not an expert - I had this question before a year or two
ago, and read up on the ideas of experts.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2006-09-08 20:42:09 Re: Fixed length data types issue
Previous Message Tom Lane 2006-09-08 20:17:40 Re: Proposal for GUID datatype