Re: basic questions: Postgres with yum on CentOS 5.1

From: Chuck <chuckr(at)velofish(dot)com>
To: Tomasz Ostrowski <tometzky(at)batory(dot)org(dot)pl>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: basic questions: Postgres with yum on CentOS 5.1
Date: 2008-01-07 07:28:23
Message-ID: p06230911c3a770dda526@[192.168.1.55]
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm sorry for my delayed response. Tomasz, thanks for your email.

At 2:38 PM +0100 1/3/08, Tomasz Ostrowski wrote:
>On Tue, 01 Jan 2008, Chuck wrote:
>
>> I'm not sure how to "make sure automatic updates are turned on" as
>> Tometzky recommended. Is that a yum setting?
>
>You need to install and configure "yum-updatesd" to perform automatic
>updates for you. I don't use it so I don't know exactly how to do
>this, but I believe it has a well documented configuration file in
>/etc/.

I'll look into this further. Thanks.

> > [root(at)vs191 ~]# service postgresql start
>> Initializing database: [ OK ]
>> When I listed the current databases, I found out that UTF-8 is not
>>being used.
>
>You did not set your /etc/sysconfig/i18n and reboot before you first
>started, ignoring my recommendation. I'd delete /var/lib/pgsql/data
>(if there's no data yet) and try again after this setting and reboot.

Since I had sent this email, I contacted my web host for help. They
said that I could '-E UTF8 --no-locale' to the initdb call within
/etc/init.db/postgresql. I stopped postgres, deleted the data
directory and restarted postgres. My cluster was now using UTF-8:

bash-3.1$ psql -l
List of databases
Name | Owner | Encoding
------------------+----------+----------
postgres | postgres | UTF8
template0 | postgres | UTF8
template1 | postgres | UTF8
(6 rows)

I used 'createdb myTest' to create new database with that uses UTF-8.

My main concern was to set the encoding to UTF-8. I knew that was
important. I believe that I did that with the '-E UTF8' option for
initdb. Sort order, and specifically setting LC_COLLATE and LC_CTYPE
was less of a concern. (I still need to read and learn more.)

Fortunately, I expect to be able to delete my /var/lib/pgsql/data
directory for the next few weeks, if necessary. I wanted to
investigate your recommendation further before accepting it.

By the way, do you think that specifying '--locale=en_US.UTF-8' for
initdb id equivalent to having LANG="en_US.UTF-8" set in the
"/etc/sysconfig/i18n" file (and rebooting)?

> > I need to store multiple languages in my database such as English, French
>>
> > The end of the "21.2.2. Setting the Character Set" section says, "One way
> > to use multiple encodings safely is to set the locale to C or POSIX during
> > initdb, thus disabling any real locale awareness."
>
>This is a very bad solution, as it would allow you to store any
>garbage string in a database. It won't know letter boundaries, so any
>text functions will misbehave badly. When your database encoding is
>UTF-8 then you'll be forced to save consistant UTF-8 strings and
>sorting, text functions, regular expressions etc... will work as
>expected.

I agree with you that enforcing a database encoding of UTF-8 is a
good approach. I believe that I'm doing that.

If I'm storing multiple languages such as English, French and
Japanese do I really want to specify an English locale for English
sorting only (which will affect indexes)? If I have multiple
languages and must pick one locale for Postgres, is no locale with
(with UTF-8 encoding) acceptable?

This reference at the end of the "21.2.2. Setting the Character Set"
section in the 8.1 manual still makes sense to me:
Important: Although you can specify any encoding you want for a
database, it is unwise to choose an encoding that is not what is
expected by the locale you have selected. The LC_COLLATE and LC_CTYPE
settings imply a particular encoding, and locale-dependent operations
(such as sorting) are likely to misinterpret data that is in an
incompatible encoding.
Since these locale settings are frozen by initdb, the apparent
flexibility to use different encodings in different databases of a
cluster is more theoretical than real. It is likely that these
mechanisms will be revisited in future versions of PostgreSQL.
One way to use multiple encodings safely is to set the locale to C
or POSIX during initdb, thus disabling any real locale awareness.
http://www.postgresql.org/docs/8.1/static/multibyte.html

Am I on the right track? Any thoughts would be appreciated.

Thanks,
Chuck

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message mljv 2008-01-07 09:32:26 Memory on 32bit machine
Previous Message Kris Jurka 2008-01-07 06:24:23 Re: Patch for Statement.getGeneratedKeys()