Re: How to remove non-UTF values from a table?

From: Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com>
To: PG-General Mailing List <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to remove non-UTF values from a table?
Date: 2009-12-14 12:21:08
Message-ID: e373d31e0912140421s15834f41xcf1c4c1a5a18ad02@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Actually I just realized that the SQL below will also pick up on these
perfectly valid-looking columns:

http://factfinder.census.gov/servlet/ReferenceMapFramesetServlet?_bm=y&-zip=27340&-PANEL_ID=rm_result&-_MapEvent=zoomToAddress&-street=&-city=&-rm_config=|b=50|l=en|t=420|zf=0.0|ms=ref_legal_00dec|dw=0.21626605473484609|dh=0.13180874155445527|dt=gov.census.aff.domain.map.EnglishMapExtent|if=gif|cx=-79.8023|cy=35.827|zl=5|pz=5|bo=404:315:314:313:323:321:319|bl=362:360:393:392:355:354:385|ft=350:349:335:389:388:332:331|fl=381:403:204:380:369:379:368|g=16000US3752760&-tree_id=420&-errMsg=&-redoLog=false&-geo_id=16000US3752760&-states=

Which part of this is non-UTF8? Why is this going into a UTF8 table
with corrupted values? The lc_collate etc and all settings I can
imagine are already utf-8!

Thanks for any pointers.

On Mon, Dec 14, 2009 at 7:04 PM, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
> Actually the title of my email should have been "how to **replace**
> utf-8 values".
>
> Thanks.
>
>
>
> On Mon, Dec 14, 2009 at 7:03 PM, Phoenix Kiula <phoenix(dot)kiula(at)gmail(dot)com> wrote:
>> An easy question for some I hope.
>>
>> I have a DB from 8.2 days that when I now dump and try to take into
>> the 8.3.7, it gives me errors about utf-8 stuff.
>>
>> I tried searching this list's archives but could not come up with an answer.
>>
>> Google returns some sites like these:
>> http://sniptools.com/databases/finding-non-utf8-values-in-postgresql -
>> but I'm not clear on how to use them.
>>
>> Following the SQL on this site I could identify some columns that
>> contain text like this:
>>
>>    "Évolution générale de la situation démographique"
>>
>> So my guess is that the non-English characters were originally not
>> getting written in proper utf-8 variants.
>>
>> Is there any SQL possibility to find these columns and replace them
>> with utf-8 equivalents using some postgresql commands? Couldn't find
>> anything in the "Strings functions" (chapter 9 of manual).
>>
>> We're on CentOS.
>>
>> Thanks!
>>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2009-12-14 13:05:16 Re: Visibility of temporary database objects
Previous Message Fred Janon 2009-12-14 12:15:12 Fwd: pgAdmin III: timestamp displayed in what time zone?