Re: Distinct + Limit

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Francois Deliege <fdeliege(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Distinct + Limit
Date: 2012-03-28 00:18:24
Message-ID: CA+CSw_vgkjafVLfDz_gVYh_nf6rWWx_OBcQaB9YNQ3gAt29+qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Mar 27, 2012 at 11:54 PM, Francois Deliege <fdeliege(at)gmail(dot)com> wrote:
> select col1 from table1 group by col1 limit 1;
> select distinct on (col1) col1 from table1 limit 1;
>
> select col1 from table1 group by col1 limit 2;
> select distinct on (col1) col1 from table1 limit 2;
>
> Performing any of these following queries results in a full sequential scan, followed by a hash aggregate, and then the limit.  An optimization could be to stop the sequential scan as soon as the limit of results has been reached.  Am I missing something?

Yes, that would be an optimization. Unfortunately currently the
aggregation logic doesn't have special case logic to start outputting
tuples immediately when no aggregate functions are in use. In
principle it's possible to teach it to do that, peeking at the code it
seems that it wouldn't even be too hard to implement.

Currently your best options are to add an indexes for columns that you
select distinct values from, use a server side cursor and do the
distinct operation on the client (might need to adjust
cursor_tuple_fraction before doing the query to make cost estimates
better) or use a stored procedure to do the cursor + manual distinct
trick.

> Similarly, the following query results in a sequential scan:
>
> select * from table1 where col1 <> col1;

PostgreSQL query optimizer doesn't try to be a theorem prover and so
doesn't deduce the logical impossibility. For most queries, looking
for nonsensical would be a complete waste of time. The optimizer does
notice impossibilities that crop up during constant propagation, so
WHERE false or WHERE 0 = 1 would work fine. It would be best to fix
Sequel to output literal constant false for PostgreSQL. However, I
wonder if it's worth checking for this very specific case because it
is a common idiom for Oracle users to implement constant false in
where predicates due to Oracle not allowing top level literal booleans
for some arcane reason or another.

Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-03-28 14:13:01 Re: Distinct + Limit
Previous Message Francois Deliege 2012-03-27 20:54:36 Re: Distinct + Limit