Re: Avoid full GIN index scan when possible

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2019-12-27 01:36:14
Message-ID: b53614eb-6f9f-8c5c-9df8-f703b0b102b6@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26.12.2019 4:59, Alexander Korotkov wrote:
>
>  I've tried to add patch #4 to comparison, but I've catch assertion
> failure.
>
> TRAP: FailedAssertion("key->includeNonMatching", File: "ginget.c",
> Line: 1340)
There simply should be inverted condition in the assertion:
Assert(!key->includeNonMatching);

I have looked at v9 patch, and here is my review:

1. I agree with NULL-flag handling simplifications in ginNewScanKey(),
ginScanKeyAddHiddenEntry() extraction.

2. I also agree that usage of nrequired/nadditional in keyGetItem() is a more
natural solution to implement exclusion keys than my previous attempt of doing
that in scanGetKey().

But there are some questions:

Can we avoid referencing excludeOnly flag keyGetItem() by replacing these
references with !nrequired?

Maybe it would be better to move the whole block of keyGetItem() code
starting from the first loop over required keys and ending before the loop over
additional keys inside 'if (key->nrequired) { ... }'?

Can we avoid introducing excludeOnly flag by reusing searchMode and/or by
moving the initialization of nrequired/nadditional into ginNewScanKey()?

3. The following two times repeated NULL-filtering check looks too complicated
and needs to be refactored somehow:

- res = key->triConsistentFn(key);
+ if (key->excludeOnly &&
+ key->nuserentries < key->nentries &&
+ key->scanEntry[key->nuserentries]->queryCategory == GIN_CAT_NULL_KEY &&
+ key->entryRes[key->nuserentries] == GIN_TRUE)
+ res = GIN_FALSE;
+ else
+ res = key->triConsistentFn(key);

For example, a special consistentFn() can be introduced for such NOT_NULL
scankeys. Or even a hidden separate one-entry scankey with a trivial
consistentFn() can be added instead of adding hidden entry.

4. forcedRecheck flag that was previously used for discarded empty ALL scankeys
is removed now. 0-entry exclusion keys can appear instead, and their
consistentFn() simply returns constant value. Could this lead to tangible
overhead in some cases (in comparison to forcedRecheck flag)?

5. A hidden GIN_CAT_EMPTY_QUERY is added only for the first empty ALL-scankey,
NULLs in other columns are filtered out with GIN_CAT_NULL_KEY. This looks like
asymmetric, and it leads to accelerations is some cases and slowdowns in others
(depending on NULL fractions and their correlations in columns).

The following test shows a significant performance regression of v9:

insert into t select array[i], NULL, NULL from generate_series(1, 1000000) i;

| Query time, ms
WHERE condition | master | v8 | v9
---------------------------------------+--------+--------+---------
a @> '{}' | 224 | 213 | 212
a @> '{}' and b @> '{}' | 52 | 57 | 255
a @> '{}' and b @> '{}' and c @> '{}' | 51 | 58 | 290

In the older version of the patch I tried to do the similar things (initialize
only one NOT_NULL entry for the first column), but refused to do this in v8.

So, to avoid slowdowns relative to master, I can offer simply to add
GIN_CAT_EMPTY_QUERY entry for each column with empty ALL-keys if there are
no normal keys.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-12-27 02:24:01 Re: [HACKERS] Block level parallel vacuum
Previous Message tsunakawa.takay@fujitsu.com 2019-12-27 01:26:03 RE: Libpq support to connect to standby server as priority