Re: Multicolumn index corruption on 8.4 beta 2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Floris Bos / Maxnet <bos(at)je-eigen-domein(dot)nl>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Multicolumn index corruption on 8.4 beta 2
Date: 2009-06-10 19:42:44
Message-ID: 11966.1244662964@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Floris Bos / Maxnet <bos(at)je-eigen-domein(dot)nl> writes:
> Hi,
> Tom Lane wrote:
>> Floris Bos / Maxnet <bos(at)je-eigen-domein(dot)nl> writes:
>>> postgres(at)db:/data$ /opt/postgres/8.4-beta/bin/64/initdb -E SQL_ASCII -X
>>> /data/pg_xlog /data/db
>>> The database cluster will be initialized with locale en_US.UTF-8.
>>
>> Oooh, that doesn't look real good. You're going to be using strcoll()
>> comparisons that assume the data is in UTF8, but the database is not
>> enforcing valid UTF8 encoding. I have not checked the dump to see if
>> it's all valid data, but this could be the root of the issue.
>>
>> If you want to use SQL_ASCII because the data isn't uniformly encoded,
>> it'd be better to use C locale.

> Darn.
> Looks like you are right!
> Works a lot better with "--locale=C"

> My 8.3 PostgreSQL installation ran under FreeBSD, and there the locale
> is C by default:
> So I was not used to have to add a "--locale=C" option.
> Under Opensolaris it's indeed UTF-8 by default.

Yeah, this is kind of unfortunate. I'm not sure there is much we could
do about it, unless we want to insist that C locale be used if the
database encoding is SQL_ASCII. That cure seems worse than the disease
though. We have locked down encoding/locale combinations pretty
strictly for 8.4, but SQL_ASCII is generally thought to be a "let the
user beware" setting.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-06-10 19:45:50 Re: pgindent run coming
Previous Message Peter Eisentraut 2009-06-10 19:39:08 Re: pgindent run coming