Re: Q: will GROUP BY make use of an index to return tuples early?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Q: will GROUP BY make use of an index to return tuples early?
Date: 2002-06-09 18:57:06
Message-ID: 6664.1023649026@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Gunther Schadow <gunther(at)aurora(dot)regenstrief(dot)org> writes:
> SELECT id, MIN(foo.time)
> FROM Foo foo
> GROUP BY foo.id;
> can one tell the query executor to do an index scan on Foo_id_idx

That is a considered plan type. For example, using the regression
test database:

regression=# set enable_seqscan TO 0;
SET
regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=0.00..1121.89 rows=1000 width=8) (actual time=19.12..1619.71 rows=100 loops=1)
-> Group (cost=0.00..1096.89 rows=10000 width=8) (actual time=0.56..1484.14 rows=10000 loops=1)
-> Index Scan using tenk1_hundred on tenk1 (cost=0.00..1071.89 rows=10000 width=8) (actual time=0.53..1194.64 rows=10000 loops=1)
Total runtime: 1621.69 msec
(4 rows)
regression=# set enable_seqscan TO 1;
SET
regression=# explain analyze select hundred,min(unique1) from tenk1 group by hundred;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=997.39..1072.39 rows=1000 width=8) (actual time=1162.52..1677.46 rows=100 loops=1)
-> Group (cost=997.39..1047.39 rows=10000 width=8) (actual time=1157.33..1565.75 rows=10000 loops=1)
-> Sort (cost=997.39..1022.39 rows=10000 width=8) (actual time=1157.29..1298.10 rows=10000 loops=1)
Sort Key: hundred
-> Seq Scan on tenk1 (cost=0.00..333.00 rows=10000 width=8) (actual time=0.16..277.69 rows=10000 loops=1)
Total runtime: 1695.83 msec
(6 rows)

The sort-based plan usually comes out ahead in estimated total cost, but
if you were using a cursor to read it then I believe the indexscan plan
would get chosen due to its much lower startup cost.

> If the answer is yes, there is a way, then how about if we do this:
> CREATE INDEX Foo_id_time_idx ON Foo(id, time);
> now, considering the same query, it could be executed even faster,
> because we could do an index scan on Foo_id_time_idx and only need
> to consider the first data tuple of every Foo.id group (because
> the ordering of Foo_id_time_idx guarantees that the MIN(time) is
> in the first tuple.

This won't happen because (a) the optimizer treats aggregates as black
boxes; there is no notion that some aggregates have a connection to the
sort ordering of some indexes; (b) the indexscan would still have to
pass over the other time values in each group.

regards, tom lane

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Ian Cass 2002-06-09 18:58:19 Re: select failure
Previous Message Gunther Schadow 2002-06-09 18:10:52 Re: select failure