Character encoding and string matches

From: Richard van den Berg <richard(dot)vandenberg(at)trust-factory(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Character encoding and string matches
Date: 2005-12-07 09:20:22
Message-ID: 4396A956.5070900@trust-factory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I just upgraded our database from PostgreSQL 7.4.7 to 8.1.0. Our
original character enconding was set to the default of 7.4, being
SQL_ASCII. Along the way I realized our data does have non-ASCII
strings, so I decided to use this upgrade to fix this setting. Our new
8.1.0 database now has the LATIN1 encoding. The pg_dump / pg_restore
went fine with respect to the encodings. The LATIN1 characters are
correct at the places where I expect them.

Most of our data is supplied the customers, and sometimes we see bogus
characters in strings. It's something we live with and we always report
this to them in the hope it will not happen again. During the
pg_restore, it complained about failing to create several indexes
because of duplicates. Inspecting these cases showed that the bogus
characters are to blame. It seems that these characters (that are
probably not defined in LATIN1) are now turned into wildcards: they will
match any other character. Is this an expected result from running a non
SQL_ASCII database?

An interesting example is this (I hope my E-mail character encoding is
set up correctly):

apps\ì_dir\
apps\ý_dir\

There are 2 different non-ascii characters before the "_dir". Postgres
treats the first one as a wildcard, but not the second one. Strangely
enough both characters are in the ISO-8859-1 table: 0xEC and 0xFD.

Sincerely,

--
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. | www.dna-portal.net
Bazarstraat 44a | www.trust-factory.com
2518AK The Hague | Phone: +31 70 3620684
The Netherlands | Fax : +31 70 3603009
-------------------------------------------

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sim Zacks 2005-12-07 10:11:22 pg_autovacuum
Previous Message Formación On-Line - Formabyte 2005-12-07 09:19:59 CURSOS FORMACIÓN INFORMATICA ONLINE