Quick Links

Re: big distinct clause vs. group by

From:	Uwe Bartels <uwe(dot)bartels(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: big distinct clause vs. group by
Date:	2011-04-18 17:13:28
Message-ID:	BANLkTimPgtYKLyEE8K0NKoE=DAvgm7p8+A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi Robert,

thanks for your answer.
the aggregate function I was talking about is the function I need to use for
the non-group by columns like min() in my example.
There are of course several function to choose from, and I wanted to know
which causes as less as possible resources.

best regards,
Uwe

On 18 April 2011 18:19, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Mar 16, 2011 at 4:45 AM, Uwe Bartels <uwe(dot)bartels(at)gmail(dot)com>
> wrote:
> > I'm having trouble with some sql statements which use an expression with
> > many columns and distinct in the column list of the select.
> > select distinct col1,col2,.....col20,col21
> > from table1 left join table2 on <join condition>,...
> > where
> > <other expressions>;
> >
> > The negative result is a big sort with teporary files.
> > -> Sort (cost=5813649.93..5853067.63 rows=15767078
> width=80)
> > (actual time=79027.079..81556.059 rows=12076838 loops=1)
> > Sort Method: external sort Disk: 1086096kB
> > By the way - for this query I have a work_mem of 1 GB - so raising this
> > further is not generally possible - also not for one special command, due
> to
> > parallelism.
> >
> > How do I get around this?
>
> Hmm. It seems to me that there's no way to work out the distinct
> values without either sorting or hashing the output, which will
> necessarily be slow if you have a lot of data.
>
> > I have one idea and like to know if there any other approaches or an even
> > known better solution to that problem. By using group by I don't need the
> > big sort for the distinct - I reduce it (theoreticly) to the key columns.
> >
> > select <list of key columns>,<non key column>
> > from tables1left join table2 on <join condition>,...
> > where
> > <other conditions>
> > group by <list of key columns>
>
> You might try SELECT DISTINCT ON (key columns) <key columns> <non-key
> columns> FROM ...
>
> > Another question would be what's the aggregate function which needs as
> less
> > as possible resources (time).
>
> Not sure I follow this part.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

In response to

Re: big distinct clause vs. group by at 2011-04-18 16:19:34 from Robert Haas

Responses

Re: big distinct clause vs. group by at 2011-04-19 08:24:18 from Robert Klemme
Re: big distinct clause vs. group by at 2011-04-23 19:34:11 from Robert Haas

Browse pgsql-performance by date

	From	Date	Subject
Next Message	John Rouillard	2011-04-18 18:12:41	Assessing performance of fetches
Previous Message	Robert Haas	2011-04-18 16:51:59	Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3