Re: How slow is DISTINCT?

From: Michael Contzen <mcontzen(at)dohle(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: How slow is DISTINCT?
Date: 2002-04-09 10:15:47
Message-ID: 3CB2BF53.77D41E79@dohle.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hello,

our problem to 'select distinct' is similar: We have a big table of
about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it
works fine!

Createing an index on it quite fast (nearly as fast than oracle on the
same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes.

Then we tried 'select distinct one_field' which would result to about
200.000 different values. Postgres needed 6 hours while Oracle managed
it in about 30 minutes.

Looking into the pgsql_tmp directory of this db while doing this
selection showed me a lot of tempfiles nearly as big as the table.

Does postgres sort the whole table without projection to one column an
performs a unique on this whole table?
This would explain the big amount of disk usage in pgsql_tmp and the big
amount of time.

The statement, something could be wrong with the data, is not very
useful: This is data of our electronic cash-desks. Unfortunately our
customers buy every day nearly the same articles - therefor the
repeatition of data :-)

Kind regards

M.Contzen
Developer
Dohle Systemberatung
Germany

Some facts of our test:

Table "warenausg_ges"
Column | Type | Modifiers
-----------+---------------+-----------
ean | numeric(13,0) |
menge | numeric(13,3) |
lvkumsatz | numeric(15,3) |
vkumsatz | numeric(15,3) |
ekumsatz | numeric(15,3) |
rabatt | numeric(12,0) |
kdnr | numeric(10,0) |
artnr | numeric(10,0) |
lfnr | numeric(10,0) |
wg | integer |
aktion | character(1) |
datum | date |
status | integer |
Indexes: warenausg_ges_inx

Index "warenausg_ges_inx"
Column | Type
--------+------
datum | date
btree

explain select distinct artnr from warenausg_ges;
NOTICE: QUERY PLAN:

Unique (cost=224522801.22..225315849.86 rows=31721946 width=12)
-> Sort (cost=224522801.22..224522801.22 rows=317219456 width=12)
-> Seq Scan on warenausg_ges (cost=0.00..165793667.00
rows=317219456 width=12)

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Søren Vainio 2002-04-09 11:28:38 Re: Scadinavian characters in regular expressions
Previous Message Andreas Joseph Krogh 2002-04-09 09:53:29 Re: Scadinavian characters in regular expressions