Re: Facing issue in using special characters

From: "Warner, Gary, Jr" <gar(at)uab(dot)edu>
To: M Tarkeshwar Rao <m(dot)tarkeshwar(dot)rao(at)ericsson(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>, "pgsql-hackers-owner(at)postgresql(dot)org" <pgsql-hackers-owner(at)postgresql(dot)org>
Subject: Re: Facing issue in using special characters
Date: 2019-03-17 15:01:40
Message-ID: B446C5BC-7195-4BA0-80E6-A15D5CBDF365@uab.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-performance

Many of us have faced character encoding issues because we are not in control of our input sources and made the common assumption that UTF-8 covers everything.

In my lab, as an example, some of our social media posts have included ZawGyi Burmese character sets rather than Unicode Burmese. (Because Myanmar developed technology In a closed to the world environment, they made up their own non-standard character set which is very common still in Mobile phones.). We had fully tested the app with Unicode Burmese, but honestly didn’t know ZawGyi was even a thing that we would see in our dataset. We’ve also had problems with non-Unicode word separators in Arabic.

What we’ve found to be helpful is to view the troubling code in a hex editor and determine what non-standard characters may be causing the problem.

It may be some data conversion is necessary before insertion. But the first step is knowing WHICH characters are causing the issue.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2019-03-17 15:38:41 Re: How to parse XML in Postgres newer versions also
Previous Message Andrus 2019-03-17 14:19:33 Re: How to parse XML in Postgres newer versions also

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-17 15:03:25 Re: jsonpath
Previous Message Tom Lane 2019-03-17 14:22:50 Re: CREATE OR REPLACE AGGREGATE?

Browse pgsql-performance by date

  From Date Subject
Next Message Gunther 2019-03-17 18:42:04 Re: Distributing data over "spindles" even on AWS EBS, (followup to the work queue saga)
Previous Message Rory Campbell-Lange 2019-03-16 18:58:55 MDRaid or LSI MegaRAID?