Skip site navigation (1) Skip section navigation (2)

Re: Impact of UNICODE encoding on performance

From: Reshat Sabiq <sabiq(at)purdue(dot)edu>
To: aarni(dot)ruuhimaki(at)kymi(dot)com
Cc: Harry Mantheakis <harry(at)mantheakis(dot)freeserve(dot)co(dot)uk>,pgsql-novice(at)postgresql(dot)org
Subject: Re: Impact of UNICODE encoding on performance
Date: 2004-03-18 03:27:14
Message-ID: 40591712.3040905@purdue.edu (view raw or flat)
Thread:
Lists: pgsql-novice
I'm not very knowledgeable on this, but i think you should try UTF-8 
from the start, given your expectations. I am able to save UTF-8 strings 
into LATIN-1 db, and retrieve them, using JDBC, but viewing them in 
pgAdmin III is not a pretty site (understandably). But i haven't used it 
extensively, and i think that queries (comparisons) might be affected 
with this setup (i.e., a string with 2 characters corresponding to 1 
UTF-8 character would be equal to the its UTF-8 counterpart, which is 
clearly not intended). On the other hand, it is also conceivable that 
queries won't be affected, if no meaningless overlaps like that can occur.
In general, i read that Unicode is somewhat slower (understandably), but 
i don't think it's significant. One just needs to have a senseful 
character comparison method that does a bitmap first, so i don't think 
the overhead is big. There are probably studies on the web.

-- 
Sincerely,
Reshat.

---
If you see my certificate with this message, you should be able to send me encrypted e-mail. 
Please consult your e-mail client for details if you would like to do that.



Aarni Ruuhimäki wrote:

>Hi Harry,
>
>Dunno about the performance penalty, but so far I am happy with LATIN1 dbase 
>system (RH and Trustix). Even with cyrillic characters. Then again, I work 
>with browser interfaces and it's not really up to me what encoding the client 
>has or has not installed. <if western, charset=iso-iso-8859-1, if fellow 
>russki harasoo charset=windows-1251> is, I guess, a good bet. It's a windows 
>world, so far.
>
>Soviet KOI-X X, KOI8-r, KOI8-RU, Mac Cyrillic (Standard), CyrWin Cyrillic and 
>the rest of the soup ...
>
>Some experience and my half a pea.
>
>BR,
>
>Aarni
>
>
>On Tuesday 16 March 2004 12:43, you wrote:
>  
>
>>Hello
>>
>>I am just setting out on a new project, having recently switched to
>>PostgreSQL.
>>
>>My immediate requirements would be satisfied with ISO-8859-1 (LATIN-1)
>>encoding, but it is conceivable that, if things go really well, somewhere
>>in the future my character encoding requirements will broaden.
>>
>>So I am tempted to specify UNICODE form the outset, and be done with it.
>>
>>But I cannot help wondering how much of a performance penalty this entails.
>>
>>If the performance hit is not significant, I shall be happy to stick with
>>UNICODE.
>>
>>But if anyone has any strong views (or experience) on this issue I shall be
>>very grateful for some feedback.
>>
>>Many thanks.
>>
>>Harry Mantheakis
>>London, UK
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 7: don't forget to increase your free space map settings
>>    
>>
>
>  
>

In response to

Responses

pgsql-novice by date

Next:From: Harry MantheakisDate: 2004-03-18 08:34:00
Subject: Re: Impact of UNICODE encoding on performance
Previous:From: Harry MantheakisDate: 2004-03-17 16:25:11
Subject: Re: Impact of UNICODE encoding on performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group