From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com> |
Cc: | Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Floris Van Nee <florisvannee(at)optiver(dot)com>, James Coleman <jtc331(at)gmail(dot)com>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
Subject: | Re: Index Skip Scan |
Date: | 2019-07-02 09:00:06 |
Message-ID: | CA+hUKGKo30N5VNuRWhDuMGVZ3hTcv4J5RGXe286GRJZLk_jBYQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 21, 2019 at 1:20 AM Jesper Pedersen
<jesper(dot)pedersen(at)redhat(dot)com> wrote:
> Attached is v20, since the last patch should have been v19.
I took this for a quick spin today. The DISTINCT ON support is nice
and I think it will be very useful. I've signed up to review it and
will have more to say later. But today I had a couple of thoughts
after looking into how src/backend/optimizer/plan/planagg.c works and
wondering how to do some more skipping tricks with the existing
machinery.
1. SELECT COUNT(DISTINCT i) FROM t could benefit from this. (Or
AVG(DISTINCT ...) or any other aggregate). Right now you get a seq
scan, with the sort/unique logic inside the Aggregate node. If you
write SELECT COUNT(*) FROM (SELECT DISTINCT i FROM t) ss then you get
a skip scan that is much faster in good cases. I suppose you could
have a process_distinct_aggregates() in planagg.c that recognises
queries of the right form and generates extra paths a bit like
build_minmax_path() does. I think it's probably better to consider
that in the grouping planner proper instead. I'm not sure.
2. SELECT i, MIN(j) FROM t GROUP BY i could benefit from this if
you're allowed to go forwards. Same for SELECT i, MAX(j) FROM t GROUP
BY i if you're allowed to go backwards. Those queries are equivalent
to SELECT DISTINCT ON (i) i, j FROM t ORDER BY i [DESC], j [DESC]
(though as Floris noted, the backwards version gives the wrong answers
with v20). That does seem like a much more specific thing applicable
only to MIN and MAX, and I think preprocess_minmax_aggregates() could
be taught to handle that sort of query, building an index only scan
path with skip scan in build_minmax_path().
--
Thomas Munro
https://enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2019-07-02 09:16:14 | Re: Support for jsonpath .datetime() method |
Previous Message | Julien Rouhaud | 2019-07-02 08:45:44 | Re: Add parallelism and glibc dependent only options to reindexdb |