From: | Justin Pasher <justinp(at)newmediagateway(dot)com> |
---|---|
To: | Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Best practices for moving UTF8 databases |
Date: | 2009-07-22 16:24:41 |
Message-ID: | 4A673D49.7000905@newmediagateway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Phoenix Kiula wrote:
> I tried this. Get an error.
>
>
> mypg=# select * from interesting WHERE NOT description ~ ( '^('||
> mypg(# $$[\09\0A\0D\x20-\x7E]|$$|| -- ASCII
> mypg(# $$[\xC2-\xDF][\x80-\xBF]|$$|| -- non-overlong 2-byte
> mypg(# $$\xE0[\xA0-\xBF][\x80-\xBF]|$$|| -- excluding overlongs
> mypg(# $$[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|$$|| -- straight 3-byte
> mypg(# $$\xED[\x80-\x9F][\x80-\xBF]|$$|| -- excluding surrogates
> mypg(# $$\xF0[\x90-\xBF][\x80-\xBF]{2}|$$|| -- planes 1-3
> mypg(# $$[\xF1-\xF3][\x80-\xBF]{3}|$$|| -- planes 4-15
> mypg(# $$\xF4[\x80-\x8F][\x80-\xBF]{2}$$|| -- plane 16
> mypg(# '*)$' )
> mypg-#
> mypg-# ;
> ERROR: invalid regular expression: quantifier operand invalid
>
If you really don't want to go the "pg_dump -> iconv (remove invalid
characters) -> diff the dump files" route, a stored procedure that
searches for invalid characters was posted a few years back that
attempts to find the invalid characters.
http://archives.postgresql.org/pgsql-hackers/2005-12/msg00511.php
http://svana.org/kleptog/pgsql/utf8_verify.sql
--
Justin Pasher
From | Date | Subject | |
---|---|---|---|
Next Message | Robert James | 2009-07-22 16:57:34 | Can LIKE under utf8 use INDEXes? |
Previous Message | Grzegorz Jaśkiewicz | 2009-07-22 14:21:16 | Re: How would I get information regarding update when running for a long time? |