DISTINCT with btree skip scan

From: Thomas Munro <munro(at)ip9(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: DISTINCT with btree skip scan
Date: 2014-07-05 00:17:32
Message-ID: CADLWmXXbTSBxP-MzJuPAYSsL_2f0iPm5VWPbCvDbVvfX93FKkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

As an exercise I hacked up the simplest code I could think of that would
demonstrate a faster DISTINCT based on skipping ahead to the next distinct
value in an index-only scan. Please see the attached (extremely buggy)
patch, and the example session below. (It's against my natural instinct to
send such half-baked early hacking phase code to the list, but thought it
would make sense to demo the concept and then seek advice, warnings, cease
and desist notices etc before pressing on down that route!) I would be
most grateful for any feedback you might have.

Best regards,

Thomas Munro

postgres=# create table foo (a int, b int, primary key (a, b));
CREATE TABLE
Time: 9.002 ms
postgres=# insert into foo select generate_series(1, 5000000) % 10,
generate_series(1, 5000000);
INSERT 0 5000000
Time: 35183.444 ms
postgres=# analyze foo;
ANALYZE
Time: 493.168 ms

postgres=# set enable_hashagg = true;
SET
Time: 0.274 ms
postgres=# explain select distinct a from foo;
┌───────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
├───────────────────────────────────────────────────────────────────┤
│ HashAggregate (cost=84624.00..84624.10 rows=10 width=4) │
│ Group Key: a │
│ -> Seq Scan on foo (cost=0.00..72124.00 rows=5000000 width=4) │
│ Planning time: 0.065 ms │
└───────────────────────────────────────────────────────────────────┘
(4 rows)

Time: 0.500 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 6 │
│ 8 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 9 │
│ 7 │
│ 0 │
└───┘
(10 rows)

Time: 2017.019 ms

postgres=# set enable_hashagg = false;
SET
Time: 0.302 ms
postgres=# explain select distinct a from foo;
┌─────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN

├─────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan for distinct prefix 1 using foo_pkey on foo
(cost=0.43..263354.20 rows=10 width=4) │
│ Planning time: 0.063 ms

└─────────────────────────────────────────────────────────────────────────────────────────────────────┘
(2 rows)

Time: 0.443 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.565 ms

Attachment Content-Type Size
distinct-skip.patch text/x-patch 19.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2014-07-05 01:03:05 Re: DISTINCT with btree skip scan
Previous Message Kouhei Kaigai 2014-07-04 21:38:05 Re: RLS Design