Re: DISTINCT with btree skip scan

From: Thomas Munro <munro(at)ip9(dot)org>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DISTINCT with btree skip scan
Date: 2014-10-31 21:35:52
Message-ID: CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27 October 2014 20:24, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I've had a quick look at this and it seems like a great win! I'm quite
> surprised that we've not got this already. I think this technology could
> also really help performance of queries such as SELECT * from bigtable bt
> WHERE EXISTS(SELECT 1 FROM otherbigtable obt WHERE bt.somecol =
> obt.someIndexedButNonUniqueColumn); I know you're not proposing to improve
> that first off, but it could be done later once the benefits of this are
> more commonly realised.
>
> I think our shortfalls in this area have not gone unnoticed. I was reading
> this post
> https://www.periscope.io/blog/count-distinct-in-mysql-postgres-sql-server-and-oracle.html
> about comparing performance of COUNT(DISTINCT col). I think this would give
> a big win for query 3 in that post. I'm trying to find some time make some
> changes to transform queries to allow the group by to happen before the
> joins when possible, so between that and your patch we'd be 2 steps closer
> to making query 1 in the link above perform a little better on PostgreSQL.
>
> Do you think you'll manage to get time to look at this a bit more? I'd be
> keen to look into the costing side of it if you think that'll help you any?

Hi David,

Sorry for the silence, I have been busy moving countries.

I am definitely interested in collaborating on a series of patches to
implement various kinds of skip-based plans as seen in other RDBMSs if
others think it could be useful. I see skip-based DISTINCT as a good
place to start. (I suspect the semi-related skip scan plan (for the
case when your WHERE clause doesn't have a qual for the leading
column(s)) is much harder to plan and execute and I also suspect it's
less generally useful).

Here is a rebased version of that patch which fixes a crashing
off-by-one error, and handles backward scans properly, I think. This
is still just a quick hack to play with the general idea at this
stage.

It works by adding a new index operation 'skip' which the executor
code can use during a scan to advance to the next value (for some
prefix of the index's columns). That may be a terrible idea and
totally unnecessary... but let me explain my
reasoning:

1. Perhaps some index implementations can do something better than a
search for the next key value from the root. Is it possible or
desirable to use the current position as a starting point for a btree
traversal? I don't know.

2. It seemed that I'd need to create a new search ScanKey to use the
'rescan' interface for skipping to the next value, but I already had
an insertion ScanKey so I wanted a way to just reuse that. But maybe
there is some other way to reuse existing index interfaces, or maybe
there is an easy way to make a new search ScanKey from the existing
insertion ScanKey?

Currently it uses the existing IndexOnlyScan plan node, adding an
extra member. I briefly tried making a different plan called
DistinctIndexScan but it seemed I was going to need to duplicate or
refactor/share a lot of IndexOnlyScan code (this might be the right
thing to do but at this stage I'm just trying to demo the general idea
with minimal changes).

As for costing, I haven't looked at that part of PG at all yet. If
you *can* do a distinct index scan, would you *always* want to? How
well could we estimate the cost using existing statistics? I suppose
the expected number of page fetches is proportional to the expected
number of distinct values (ie skip operations) times btree depth, and
the cost may be adjusted in some way for cache effects (?), but how
well can we estimate the number of distinct values (a), (a, b) or (a,
b, c) in some range for an index on (a, b, c, d)?

As before, you still need the kludge of disabling hashagg to reach the new plan:

postgres=# set enable_hashagg = true;
SET
Time: 0.213 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 6 │
│ 8 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 9 │
│ 7 │
│ 0 │
└───┘
(10 rows)

Time: 890.166 ms
postgres=# set enable_hashagg = false;
SET
Time: 0.271 ms
postgres=# select distinct a from foo;
┌───┐
│ a │
├───┤
│ 0 │
│ 1 │
│ 2 │
│ 3 │
│ 4 │
│ 5 │
│ 6 │
│ 7 │
│ 8 │
│ 9 │
└───┘
(10 rows)

Time: 0.573 ms

Best regards,
Thomas Munro

Attachment Content-Type Size
distinct-skip-v2.patch text/x-patch 19.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adam Brightwell 2014-10-31 21:37:12 Re: alter user/role CURRENT_USER
Previous Message Andres Freund 2014-10-31 20:10:32 Re: group locking: incomplete patch, just for discussion