Duplicate Values or Not?!

From: John Seberg <johnseberg(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Duplicate Values or Not?!
Date: 2005-09-16 20:24:25
Message-ID: 20050916202425.10977.qmail@web50202.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I recently tried to CREATE a UNIQUE INDEX and could
not, due to duplicate values:

CREATE UNIQUE INDEX usr_login ON usr (login);

To try to find the offending row(s), I then executed
the following:

SELECT count(*), login FROM usr GROUP BY login ORDER
BY 1 DESC;

The GROUP BY didn't group anything, indicating to me
that there were no duplicate values. There were the
same number of rows in this query as a simple SELECT
count(*) FROM usr.

This tells me that Postgresql is not using the same
method for determining duplicates when GROUPING and
INDEXing.

I dig a little deeper. The best candidate I find for a
possible duplicate are caused by characters that did
not translate well. IIRC, the basis was the name Pena,
which looked like Pe?a. I'm thinking the original data
was not encoded properly, or my export didn't handle
encodings properly, etc. The two Penas used different
characters in the 3rd position, neither of which were
translated correctly.

I loaded up data from another database vendor (4th
Dimension), into a 8.0.3 Postgresql, which I had
compiled from source with the default configuration.
This was on Yellow Dog Linux 4.0.1.

I brought the same data into a 8.0.1 on Max OS X
(binary from entropy.ch) and did NOT have this UNIQUE
INDEX failure.

I'm sure my problems are deeper than the INDEX
failure, involving the accuracy of the conversion,
but, short term, I would like to know what is
different? They both are SQL_ASCII databases. I tried
importing into a UNICODE database, but that really a
mess of errors (during COPY).

I realize I need to learn about encodings, my source
data, etc., but I'm looking for hints. Anybody
experienced in exported 4th Dimension data containing
a certain amount of foriegn language text?

Thanks,


__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Marc Munro 2005-09-16 20:28:13 pg_ctl reload breaks our client
Previous Message Scott Marlowe 2005-09-16 19:42:05 Re: Restoring just a table or row from a backup copy.