| From: | Richard van den Berg <richard(dot)vandenberg(at)trust-factory(dot)com> | 
|---|---|
| To: | pgsql-general(at)postgresql(dot)org | 
| Subject: | Character encoding and string matches | 
| Date: | 2005-12-07 09:20:22 | 
| Message-ID: | 4396A956.5070900@trust-factory.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-general | 
I just upgraded our database from PostgreSQL 7.4.7 to 8.1.0. Our
original character enconding was set to the default of 7.4, being
SQL_ASCII. Along the way I realized our data does have non-ASCII
strings, so I decided to use this upgrade to fix this setting. Our new
8.1.0 database now has the LATIN1 encoding. The pg_dump / pg_restore
went fine with respect to the encodings. The LATIN1 characters are
correct at the places where I expect them.
Most of our data is supplied the customers, and sometimes we see bogus
characters in strings. It's something we live with and we always report
this to them in the hope it will not happen again. During the
pg_restore, it complained about failing to create several indexes
because of duplicates. Inspecting these cases showed that the bogus
characters are to blame. It seems that these characters (that are
probably not defined in LATIN1) are now turned into wildcards: they will
match any other character. Is this an expected result from running a non
SQL_ASCII database?
An interesting example is this (I hope my E-mail character encoding is
set up correctly):
apps\ì_dir\
apps\ý_dir\
There are 2 different non-ascii characters before the "_dir". Postgres
treats the first one as a wildcard, but not the second one. Strangely
enough both characters are in the ISO-8859-1 table: 0xEC and 0xFD.
Sincerely,
-- 
Richard van den Berg, CISSP
-------------------------------------------
Trust Factory B.V. |     www.dna-portal.net
Bazarstraat 44a    |  www.trust-factory.com
2518AK The Hague   |  Phone: +31 70 3620684
The Netherlands    |  Fax  : +31 70 3603009
-------------------------------------------
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Sim Zacks | 2005-12-07 10:11:22 | pg_autovacuum | 
| Previous Message | Formación On-Line - Formabyte | 2005-12-07 09:19:59 | CURSOS FORMACIÓN INFORMATICA ONLINE |