Quick Links

DISTINCT with btree skip scan

From:	Thomas Munro <munro(at)ip9(dot)org>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	DISTINCT with btree skip scan
Date:	2014-07-05 00:17:32
Message-ID:	CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello

As an exercise I hacked up the simplest code I could think of that would
demonstrate a faster DISTINCT based on skipping ahead to the next distinct
value in an index-only scan. Please see the attached (extremely buggy)
patch, and the example session below. (It's against my natural instinct to
send such half-baked early hacking phase code to the list, but thought it
would make sense to demo the concept and then seek advice, warnings, cease
and desist notices etc before pressing on down that route!) I would be
most grateful for any feedback you might have.

Best regards,

Thomas Munro

postgres=# create table foo (a int, b int, primary key (a, b));
CREATE TABLE
Time: 9.002 ms
postgres=# insert into foo select generate_series(1, 5000000) % 10,
generate_series(1, 5000000);
INSERT 0 5000000
Time: 35183.444 ms
postgres=# analyze foo;
ANALYZE
Time: 493.168 ms

postgres=# set enable_hashagg = true;
SET
Time: 0.274 ms
postgres=# explain select distinct a from foo;
┌───────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────┤
│ HashAggregate (cost=84624.00..84624.10 rows=10 width=4) │
│ Group Key: a │
│ -> Seq Scan on foo (cost=0.00..72124.00 rows=5000000 width=4) │
│ Planning time: 0.065 ms │
└───────────────────────────────────────────────────────────────────┘
(4 rows)

Time: 0.500 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 6 │
│ 8 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 9 │
│ 7 │
│ 0 │
└───┘
(10 rows)

Time: 2017.019 ms

postgres=# set enable_hashagg = false;
SET
Time: 0.302 ms
postgres=# explain select distinct a from foo;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN
│
├─────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan for distinct prefix 1 using foo_pkey on foo
(cost=0.43..263354.20 rows=10 width=4) │
│ Planning time: 0.063 ms
│
└─────────────────────────────────────────────────────────────────────────────────────────────────────┘
(2 rows)

Time: 0.443 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.565 ms

Attachment	Content-Type	Size
distinct-skip.patch	text/x-patch	19.0 KB

Responses

Re: DISTINCT with btree skip scan at 2014-07-05 01:03:05 from Vik Fearing
Re: DISTINCT with btree skip scan at 2014-10-27 07:24:24 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Vik Fearing	2014-07-05 01:03:05	Re: DISTINCT with btree skip scan
Previous Message	Kouhei Kaigai	2014-07-04 21:38:05	Re: RLS Design