Re: Avoid full GIN index scan when possible

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2019-08-03 01:51:16
Message-ID: a6e1564e-98c8-e420-cfaf-0309316c4910@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached 6th version of the patches.

On 01.08.2019 22:28, Tom Lane wrote:

> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
>> On Thu, Aug 1, 2019 at 9:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> While I've not attempted to fix that here, I wonder whether we shouldn't
>>> fix it by just forcing forcedRecheck to true in any case where we discard
>>> an ALL qualifier.
>>>
>> +1 for setting forcedRecheck in any case we discard ALL qualifier.
>> ISTM, real life number of cases we can skip recheck here is
>> negligible. And it doesn't justify complexity.
> Yeah, that was pretty much what I was thinking --- by the time we got
> it fully right considering nulls and multicolumn indexes, the cases
> where not rechecking could actually do something useful would be
> pretty narrow. And a bitmap heap scan is always going to have to
> visit the heap, IIRC, so how much could skipping the recheck really
> save?

I have simplified patch #1 setting forcedRecheck for all discarded ALL quals.
(This solution is very close to the earliest unpublished version of the patch.)

More accurate recheck-forcing logic was moved into patch #2 (multicolumn
indexes were fixed). This patch also contains ginFillScanKey() refactoring
and new internal mode GIN_SEARCH_MODE_NOT_NULL that is used only for
GinScanKey.xxxConsistentFn initialization and transformed into
GIN_SEARCH_MODE_ALL before GinScanEntry initialization.

The cost estimation seems to be correct for both patch #1 and patch #2 and
left untouched since v05.

>>> BTW, it's not particularly the fault of this patch, but: what does it
>>> even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys?
>>>
>> It might mean we would like to see all the results, which don't
>> contain given key.
> Ah, right, I forgot that the consistent-fn might look at the match
> results.

Also I decided to go further and tried to optimize (patch #3) the case for
GIN_SEARCH_MODE_ALL with a nonzero number of keys.

Full GIN scan can be avoided in queries like this contrib/intarray query:
"arr @@ '1' AND arr @@ '!2'" (search arrays containing 1 and not containing 2).

Here we have two keys:
- key '1' with GIN_SEARCH_MODE_DEFAULT
- key '2' with GIN_SEARCH_MODE_ALL

Key '2' requires full scan that can be avoided with the forced recheck.

This query is equivalent to single-qual query "a @@ '1 & !2'" which
emits only one GIN key '1' with recheck.

Below is example for contrib/intarray operator @@:

=# CREATE EXTENSION intarray;
=# CREATE TABLE t (a int[]);
=# INSERT INTO t SELECT ARRAY[i] FROM generate_series(1, 1000000) i;
=# CREATE INDEX ON t USING gin (a gin__int_ops);
=# SET enable_seqscan = OFF;

-- master
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1' AND a @@ '!2';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=16000095.45..16007168.16 rows=5019 width=24) (actual time=66.955..66.956 rows=1 loops=1)
Recheck Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Heap Blocks: exact=1
Buffers: shared hit=6816
-> Bitmap Index Scan on t_a_idx (cost=0.00..16000094.19 rows=5019 width=0) (actual time=66.950..66.950 rows=1 loops=1)
Index Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Buffers: shared hit=6815
Planning Time: 0.086 ms
Execution Time: 67.076 ms
(9 rows)

-- equivalent single-qual query
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1 & !2';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=78.94..7141.57 rows=5025 width=24) (actual time=0.019..0.019 rows=1 loops=1)
Recheck Cond: (a @@ '1 & !2'::query_int)
Heap Blocks: exact=1
Buffers: shared hit=8
-> Bitmap Index Scan on t_a_idx (cost=0.00..77.68 rows=5025 width=0) (actual time=0.014..0.014 rows=1 loops=1)
Index Cond: (a @@ '1 & !2'::query_int)
Buffers: shared hit=7
Planning Time: 0.056 ms
Execution Time: 0.039 ms
(9 rows)

-- with patch #3
=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM t WHERE a @@ '1' AND a @@ '!2';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t (cost=75.45..7148.16 rows=5019 width=24) (actual time=0.019..0.020 rows=1 loops=1)
Recheck Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Heap Blocks: exact=1
Buffers: shared hit=6
-> Bitmap Index Scan on t_a_idx (cost=0.00..74.19 rows=5019 width=0) (actual time=0.011..0.012 rows=1 loops=1)
Index Cond: ((a @@ '1'::query_int) AND (a @@ '!2'::query_int))
Buffers: shared hit=5
Planning Time: 0.059 ms
Execution Time: 0.040 ms
(9 rows)

Patch #3 again contains a similar ugly solution -- we have to remove already
initialized GinScanKeys with theirs GinScanEntries. GinScanEntries can be
shared, so the reference counting was added.

Also modifications of cost estimation in patch #3 are questionable.
GinQualCounts are simply not incremented when haveFullScan flag is set,
because the counters anyway will be overwritten by the caller.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v06.patch text/x-patch 11.2 KB
0002-Force-GIN-recheck-more-accurately-v06.patch text/x-patch 14.4 KB
0003-Avoid-GIN-full-scan-for-non-empty-ALL-keys-v06.patch text/x-patch 8.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-08-03 04:05:36 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Andres Freund 2019-08-03 01:08:02 Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions