Re: Index Onlys Scan for expressions

From: Ildar Musin <i(dot)musin(at)postgrespro(dot)ru>
To: Vladimir Sitnikov <sitnikov(dot)vladimir(at)gmail(dot)com>
Cc: Oleg Bartunov <obartunov(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Re: Index Onlys Scan for expressions
Date: 2016-09-02 17:23:35
Message-ID: d12e0c0e-1497-5152-d316-c6e914efdd21@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Vladimir,

On 23.08.2016 23:35, Vladimir Sitnikov wrote:
> Hi,
>
> I've tried your indexonlypatch5.patch against REL9_6_BETA3.
> Here are some results.
>
> TL;DR:
> 1) <<where type=42 and upper(vc) like '%ABC%'>> does not support
> index-only scan for index (type, upper(vc) varchar_pattern_ops).
> 3) <<(... where type=42 offset 0) where upper_vc like '%ABC%'>> does
> trigger index-only scan. IOS reduces number of buffers from 977 to 17
> and that is impressive.
>
> Can IOS be used for simple query like #1 as well?
>

Thanks for checking out the patch. Sorry for the delayed reply.

> Here are the details.
>
> drop table vlsi;
> create table vlsi(type numeric, vc varchar(500));
> insert into vlsi(type,vc) select round(x/1000),
> md5('||x)||md5('||x+1)||md5(''||x+2) from generate_series(1,1000000) as
> s(x);
> create index type_vc__vlsi on vlsi(type, upper(vc) varchar_pattern_ops);
> vacuum analyze vlsi;
>
> 0) Smoke test (index only scan works when selecting indexed expression):
>
> explain (analyze, buffers) select type, upper(vc) from vlsi where type=42;
>
> Index Only Scan using type_vc__vlsi on vlsi (cost=0.55..67.97 rows=971
> width=36) (actual time=0.012..0.212 rows=1000 loops=1)
> Index Cond: (type = '42'::numeric)
> Heap Fetches: 0
> Buffers: shared hit=17
> Planning time: 0.112 ms
> Execution time: 0.272 ms
>
> 1) When trying to apply "like condition", index only scan does not work.
> Note: "buffers hit" becomes 977 instead of 17.
>
> explain (analyze, buffers) select type, upper(vc) from vlsi where
> type=42 and upper(vc) like '%ABC%';
>
> Index Scan using type_vc__vlsi on vlsi (cost=0.55..1715.13 rows=20
> width=36) (actual time=0.069..1.343 rows=23 loops=1)
> Index Cond: (type = '42'::numeric)
> Filter: (upper((vc)::text) ~~ '%ABC%'::text)
> Rows Removed by Filter: 977
> Buffers: shared hit=939
> Planning time: 0.104 ms
> Execution time: 1.358 ms
>

The reason why this doesn't work is that '~~' operator (which is a
synonym for 'like') isn't supported by operator class for btree. Since
the only operators supported by btree are <, <=, =, >=, >, you can use
it with queries like:

explain (analyze, buffers) select type, upper(vc) from vlsi where
type=42 and upper(vc) ~~ 'ABC%';
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Index Only Scan using type_vc__vlsi on vlsi (cost=0.55..4.58 rows=1
width=36) (actual time=0.021..0.021 rows=0 loops=1)
Index Cond: ((type = '42'::numeric) AND ((upper((vc)::text)) ~>=~
'ABC'::text) AND ((upper((vc)::text)) ~<~ 'ABD'::text))
Filter: ((upper((vc)::text)) ~~ 'ABC%'::text)
Heap Fetches: 0
Buffers: shared hit=4
Planning time: 0.214 ms
Execution time: 0.044 ms
(7 rows)

In case of fixed prefix postgres implicitly substitutes '~~' operator
with two range operators:

((upper((vc)::text)) ~>=~ 'ABC'::text) AND ((upper((vc)::text)) ~<~
'ABD'::text)

so that you can use these conditions to lookup in btree.

> Mere "subquery" does not help: still no index-only scan
>
> 2) explain (analyze, buffers) select * from (select type, upper(vc)
> upper_vc from vlsi where type=42) as x where upper_vc like '%ABC%';
>
> Index Scan using type_vc__vlsi on vlsi (cost=0.55..1715.13 rows=20
> width=36) (actual time=0.068..1.344 rows=23 loops=1)
> Index Cond: (type = '42'::numeric)
> Filter: (upper((vc)::text) ~~ '%ABC%'::text)
> Rows Removed by Filter: 977
> Buffers: shared hit=939
> Planning time: 0.114 ms
> Execution time: 1.357 ms
>
> 3) "offset 0" trick does help:
> explain (analyze, buffers) select * from (select type, upper(vc)
> upper_vc from vlsi where type=42 offset 0) as x where upper_vc like '%ABC%';
>
> Subquery Scan on x (cost=0.55..80.11 rows=39 width=36) (actual
> time=0.033..0.488 rows=23 loops=1)
> Filter: (x.upper_vc ~~ '%ABC%'::text)
> Rows Removed by Filter: 977
> Buffers: shared hit=17
> -> Index Only Scan using type_vc__vlsi on vlsi (cost=0.55..67.97
> rows=971 width=36) (actual time=0.015..0.210 rows=1000 loops=1)
> Index Cond: (type = '42'::numeric)
> Heap Fetches: 0
> Buffers: shared hit=17
> Planning time: 0.086 ms
> Execution time: 0.503 ms
>
> Vladimir

I debugged the last two queries to figure out the difference between
them. It turned out that that the query 2) transforms to the same as
query 1). And in 3rd query 'OFFSET' statement prevents rewriter from
transforming the query, so it is possible to use index only scan on
subquery and then filter the result of subquery with '~~' operator.

--
Ildar Musin
i(dot)musin(at)postgrespro(dot)ru

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2016-09-02 17:47:22 Re: pgbench - allow to store select results into variables
Previous Message Andres Freund 2016-09-02 17:21:42 Re: Logical decoding slots can go backwards when used from SQL, docs are wrong