Re: How slow is DISTINCT?

From: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
To: "Michael Contzen" <mcontzen(at)dohle(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: How slow is DISTINCT?
Date: 2002-04-16 01:53:51
Message-ID: GNELIHDDFBOCMGBFGEFOIECFCCAA.chriskl@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

What happens if you do...

select artnr from warenausg_ges group by artnr;

or even

select distinct (artnr) from warenausg_ges;

or

select distinct on (artnr) from warenausg_ges;

...sort of ideas...

Another idea is to maintain a table of the unique values in the column.
Create a trigger on the main table to maintian the list of unique values...

Chris

> -----Original Message-----
> From: pgsql-sql-owner(at)postgresql(dot)org
> [mailto:pgsql-sql-owner(at)postgresql(dot)org]On Behalf Of Michael Contzen
> Sent: Tuesday, 9 April 2002 6:16 PM
> To: pgsql-sql(at)postgresql(dot)org
> Subject: Re: [SQL] How slow is DISTINCT?
>
>
> Hello,
>
> our problem to 'select distinct' is similar: We have a big table of
> about 320.000.000 rows of an datawarehouse application. 55 GB, yes, it
> works fine!
>
> Createing an index on it quite fast (nearly as fast than oracle on the
> same data and same machine) 1GHz P4, 4GB-Ram -> 30 Minutes.
>
> Then we tried 'select distinct one_field' which would result to about
> 200.000 different values. Postgres needed 6 hours while Oracle managed
> it in about 30 minutes.
>
> Looking into the pgsql_tmp directory of this db while doing this
> selection showed me a lot of tempfiles nearly as big as the table.
>
> Does postgres sort the whole table without projection to one column an
> performs a unique on this whole table?
> This would explain the big amount of disk usage in pgsql_tmp and the big
> amount of time.
>
> The statement, something could be wrong with the data, is not very
> useful: This is data of our electronic cash-desks. Unfortunately our
> customers buy every day nearly the same articles - therefor the
> repeatition of data :-)
>
>
> Kind regards
>
> M.Contzen
> Developer
> Dohle Systemberatung
> Germany
>
>
> Some facts of our test:
>
> Table "warenausg_ges"
> Column | Type | Modifiers
> -----------+---------------+-----------
> ean | numeric(13,0) |
> menge | numeric(13,3) |
> lvkumsatz | numeric(15,3) |
> vkumsatz | numeric(15,3) |
> ekumsatz | numeric(15,3) |
> rabatt | numeric(12,0) |
> kdnr | numeric(10,0) |
> artnr | numeric(10,0) |
> lfnr | numeric(10,0) |
> wg | integer |
> aktion | character(1) |
> datum | date |
> status | integer |
> Indexes: warenausg_ges_inx
>
> Index "warenausg_ges_inx"
> Column | Type
> --------+------
> datum | date
> btree
>
> explain select distinct artnr from warenausg_ges;
> NOTICE: QUERY PLAN:
>
> Unique (cost=224522801.22..225315849.86 rows=31721946 width=12)
> -> Sort (cost=224522801.22..224522801.22 rows=317219456 width=12)
> -> Seq Scan on warenausg_ges (cost=0.00..165793667.00
> rows=317219456 width=12)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Josh Berkus 2002-04-16 02:51:11 Re: please advise on column data type
Previous Message jack 2002-04-16 01:25:04 please advise on column data type