Re: Duplicate Values or Not?!

From: Greg Stark <gsstark(at)mit(dot)edu>
To: John Seberg <johnseberg(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Duplicate Values or Not?!
Date: 2005-09-17 05:36:50
Message-ID: 87r7boqmq5.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

John Seberg <johnseberg(at)yahoo(dot)com> writes:

> I recently tried to CREATE a UNIQUE INDEX and could
> not, due to duplicate values:
>
> CREATE UNIQUE INDEX usr_login ON usr (login);
>
> To try to find the offending row(s), I then executed
> the following:
>
> SELECT count(*), login FROM usr GROUP BY login ORDER
> BY 1 DESC;
>
> The GROUP BY didn't group anything, indicating to me
> that there were no duplicate values. There were the
> same number of rows in this query as a simple SELECT
> count(*) FROM usr.
>
> This tells me that Postgresql is not using the same
> method for determining duplicates when GROUPING and
> INDEXing.

You might try running the GROUP BY query after doing:

set enable_hashagg = false;
select ...

With that false it would have to sort the results which should be exactly the
same code as the index is using. I think.

That doesn't really answer the rest of your questions. The short of it is that
setting the encoding doesn't magically make your data encoded in that
encoding. If your client sends it one encoding but claims it's unicode then
Postgres will happily store it in a UNICODE database and it'll be garbage.

Maybe someone else will have more specific advice on that front.

--
greg

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vikas 2005-09-17 08:35:04 unsubscribe
Previous Message Tom Lane 2005-09-17 05:23:33 Re: pg_ctl reload breaks our client