Re: Distinct oddity

From: Maximilian Tyrtania <maximilian(dot)tyrtania(at)onlinehome(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Distinct oddity
Date: 2009-05-09 07:22:24
Message-ID: C62AFBD0.35F11%maximilian.tyrtania@onlinehome.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

am 08.05.2009 16:55 Uhr schrieb Rob Sargent unter robsargent(at)rocketmail(dot)com:

>Is firmen a table or a view?

It's a table.

am 08.05.2009 21:52 Uhr schrieb Tom Lane unter tgl(at)sss(dot)pgh(dot)pa(dot)us:

>> It doesn't seem to be related to null values (which wouldn't explain it
>> anyway) nor to this particular field...
>
> Can you reproduce it in other contexts than specifically count(distinct)?
> In particular I'd try
>
> select count(*) from
> (select distinct f.bezeichnung from firmen f) ss;

FAKDB=# select count(*) from
FAKDB-# (select distinct f.bezeichnung from firmen f) ss;
count
-------
73437
(1 row)

>
> select count(*) from
> (select distinct f.bezeichnung||'e' from firmen f) ss;

FAKDB=# select count(*) from
(select distinct f.bezeichnung||'e' from firmen f) ss;
count
-------
72535
(1 row)

> If those give the same numbers as you're showing here, then the
> next step would be to dump out the actual results of the SELECT DISTINCT
> queries and compare them --- looking at the actual data values should
> give some insight as to what's happening.

FAKDB=# select distinct f.bezeichnung from firmen f order by 1 limit 5
FAKDB-# ;
bezeichnung
-----------------------------------------

Šsterreichisches Verkehrsbro AG
\x01Assistenz
\x10Frohstoff Design & Textilveredelung
"1. Mittelschule ""Am Kupferberg"""
(5 rows)

FAKDB=# select distinct f.bezeichnung||'e' from firmen f order by 1 limit 5
;
?column?
------------------------------------------
Österreich/Welt (Ltg.)e
Šsterreichisches Verkehrsbro AGe
\x01Assistenze
\x10Frohstoff Design & Textilveredelunge
"1. Mittelschule ""Am Kupferberg"""e
(5 rows)

Aha, the "Österreich/Welt (Ltg.)"-entry is missing in the 1st query. So that
does smell like a locale problem.

> BTW, what is the datatype of f.bezeichnung,

It's character varying(255). Just for the record:

FAKDB=# explain analyze select distinct f.bezeichnung||'e' from firmen f
order by 1 limit 5;
QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------
Limit (cost=16173.07..16174.08 rows=5 width=18) (actual
time=1667.841..1667.855 rows=5 loops=1)
-> Unique (cost=16173.07..16933.66 rows=3765 width=18) (actual
time=1667.839..1667.851 rows=5 loops=1)
-> Sort (cost=16173.07..16553.36 rows=152117 width=18) (actual
time=1667.837..1667.844 rows=5 loops=1)
Sort Key: (((bezeichnung)::text || 'e'::text))
Sort Method: external merge Disk: 4640kB
-> Seq Scan on firmen f (cost=0.00..13646.46 rows=152117
width=18) (actual time=0.069..353.777 rows=152118 loops=1)
Total runtime: 1669.998 ms
(7 rows)

> and what locale are you
> running in?

lc_collate | de_DE
| Shows the collation order locale.
lc_ctype | de_DE
| Shows the character classification and case conversion locale.

The encoding is UTF-8.

Best,

Maximilian Tyrtania

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Jasen Betts 2009-05-09 11:05:59 Re: backup and restore
Previous Message Emi Lu 2009-05-08 21:05:27 view vs. tables used to define that view