From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Bill Moran <wmoran(at)potentialtech(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: SELECT DISTINCT never uses an index? |
Date: | 2016-07-07 22:53:33 |
Message-ID: | 1564.1467932013@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The alternative worth considering is presumably something like:
> GroupAggregate
> -> Index Only Scan on grue_size
> Scanning an entire index in order is pretty expensive, but it seems
> possible that this could be faster than the Seq Scan, especially on a
> table with other wide columns, because then the index might be a lot
> smaller than the table. Even if the index traversal generates some
> random I/O, if it's sufficiently smaller than the table you will still
> come out ahead. I'm not positive that the planner will actually
> consider this plan,
Of course it does. Simple example in the regression database:
regression=# explain select distinct unique1 from tenk1;
QUERY PLAN
--------------------------------------------------------------------------------
------------
Unique (cost=0.29..295.29 rows=10000 width=4)
-> Index Only Scan using tenk1_unique1 on tenk1 (cost=0.29..270.29 rows=100
00 width=4)
(2 rows)
I think though that this depends on being an IOS, with a fairly wide and
all-all-visible table, in order for the cost estimate to come out cheaper
than a seqscan. If you disable IOS then the planner's second choice is
a seqscan:
regression=# set enable_indexonlyscan to 0;
SET
regression=# explain select distinct unique1 from tenk1;
QUERY PLAN
-----------------------------------------------------------------
HashAggregate (cost=483.00..583.00 rows=10000 width=4)
Group Key: unique1
-> Seq Scan on tenk1 (cost=0.00..458.00 rows=10000 width=4)
(3 rows)
A whole-table plain indexscan, or IOS with any significant number of heap
probes needed, is not going to be preferred over a seqscan because of the
amount of random I/O it implies.
> We're probably missing a few tricks on queries of this type. If the
> index-traversal machinery had a mechanism to skip quickly to the next
> distinct value, that could be used here:
Yeah, I suspect Bill was imagining that that sort of plan could be
used; but it requires execution machinery we have not got.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2016-07-07 23:15:19 | Re: A Modest Upgrade Proposal |
Previous Message | Pete Stevenson | 2016-07-07 22:48:10 | Re: MVCC overheads |