Quick Links

Re: another seemingly simple encoding question

From:	"John D(dot) Burger" <john(at)mitre(dot)org>
To:	PostgreSQL-general general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: another seemingly simple encoding question
Date:	2006-03-24 14:47:19
Message-ID:	fff3e8bce85a8c49e8d81ea4b45e367e@mitre.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

This doesn't sound like your problem, but I'll explain the
normalization issue using Korean as an example, since that seems to be
your data: There are codepoints in Unicode both for Hangul and Jamo,
so a Hangul glyph can be represented either with the single
corresponding codepoint, or as two or three Jamo codepoints. A Unicode
font would display these two alternatives identically. In any Unicode
encoding, including UTF8, these two strings would not be byte-for-byte
identical. The Unicode normalization forms are four algorithms for
normalizing the strings in such a way that they do compare identically.

Anyway, it sounds like you have the opposite problem, two strings that
are comparing equal when you think they shouldn't. I don't know that
anyone can help you unless you post an actual example of two such
strings.

- John D. Burger
MITRE

In response to

Re: another seemingly simple encoding question at 2006-03-24 14:43:45 from joseph

Browse pgsql-general by date

	From	Date	Subject
Next Message	Ian Harding	2006-03-24 15:16:49	Re: PostgreSQL 8.1 v. Oracle 10g xe
Previous Message	joseph	2006-03-24 14:43:45	Re: another seemingly simple encoding question