Re: Index Skip Scan

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>
Cc: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Alexander Kuzmenkov" <a(dot)kuzmenkov(at)postgrespro(dot)ru>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Bhushan Uparkar <bhushan(dot)uparkar(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, James Coleman <jtc331(at)gmail(dot)com>
Subject: Re: Index Skip Scan
Date: 2019-06-05 19:39:29
Message-ID: 1559763570329.85227@Optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> To address this, probably we can do something like in the attached patch.
> Altogether with distinct_pathkeys uniq_distinct_pathkeys are stored, which is
> the same, but without the constants elimination. It's being used then for
> getting the real number of distinct keys, and to check the order of the columns
> to not consider index skip scan if it's different. Hope it doesn't
> look too hacky.
>

Thanks! I've verified that it works now.
I was wondering if we're not too strict in some cases now though. Consider the following queries:

postgres=# explain(analyze) select distinct on (m,f) m,f from t where m='M2';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_m_f_t_idx on t (cost=0.29..11.60 rows=40 width=5) (actual time=0.056..0.469 rows=10 loops=1)
Scan mode: Skip scan
Index Cond: (m = 'M2'::text)
Heap Fetches: 10
Planning Time: 0.095 ms
Execution Time: 0.490 ms
(6 rows)

postgres=# explain(analyze) select distinct on (f) m,f from t where m='M2';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=0.29..849.83 rows=10 width=5) (actual time=0.088..10.920 rows=10 loops=1)
-> Index Only Scan using t_m_f_t_idx on t (cost=0.29..824.70 rows=10052 width=5) (actual time=0.087..8.524 rows=10000 loops=1)
Index Cond: (m = 'M2'::text)
Heap Fetches: 10000
Planning Time: 0.078 ms
Execution Time: 10.944 ms
(6 rows)

This is basically the opposite case - when distinct_pathkeys matches the filtered list of index keys, an index skip scan could be considered. Currently, the user needs to write 'distinct m,f' explicitly, even though he specifies in the WHERE-clause that 'm' can only have one value anyway. Perhaps it's fine like this, but it could be a small improvement for consistency.

-Floris?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-06-05 20:17:25 Add CREATE DATABASE LOCALE option
Previous Message Robert Haas 2019-06-05 19:30:50 Re: Fix runtime errors from -fsanitize=undefined