Re: Avoid full GIN index scan when possible

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2019-06-29 09:10:03
Message-ID: CAOBaU_YgU=WmUN22uGHEjUTymqEpM_aZi=ETUZd2nhEZabBc4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 29, 2019 at 12:51 AM Nikita Glukhov
<n(dot)gluhov(at)postgrespro(dot)ru> wrote:>
> On 29.06.2019 1:23, Julien Rouhaud wrote:
>
> But that kinda resembles stuff we already have - selectivity/cost. So
> why shouldn't this be considered as part of costing?
>
> Yeah, I'm not entirely convinced that we need anything new here.
> The cost estimate function can detect such situations, and so can
> the index AM at scan start --- for example, btree checks for
> contradictory quals at scan start. There's a certain amount of
> duplicative effort involved there perhaps, but you also have to
> keep in mind that we don't know the values of run-time-determined
> comparison values until scan start. So if you want certainty rather
> than just a cost estimate, you may have to do these sorts of checks
> at scan start.
>
> Ah, I didn't know about _bt_preprocess_keys(). I'm not familiar with
> this code, so please bear with me. IIUC the idea would be to add
> additional logic in gingetbitmap() / ginNewScanKey() to drop some
> quals at runtime. But that would mean that additional logic would
> also be required in BitmapHeapScan, or that all the returned bitmap
> should be artificially marked as lossy to enforce a recheck?
>
> We have a similar solution for this problem. The idea is to avoid full index
> scan inside GIN itself when we have some GIN entries, and forcibly recheck
> all tuples if triconsistent() returns GIN_MAYBE for the keys that emitted no
> GIN entries.

Thanks for looking at it. That's I think a way better approach.

> The attached patch in its current shape contain at least two ugly places:
>
> 1. We still need to initialize empty scan key to call triconsistent(), but
> then we have to remove it from the list of scan keys. Simple refactoring
> of ginFillScanKey() can be helpful here.
>
> 2. We need to replace GIN_SEARCH_MODE_EVERYTHING with GIN_SEARCH_MODE_ALL
> if there are no GIN entries and some key requested GIN_SEARCH_MODE_ALL
> because we need to skip NULLs in GIN_SEARCH_MODE_ALL. Simplest example here
> is "array @> '{}'": triconsistent() returns GIN_TRUE, recheck is not forced,
> and GIN_SEARCH_MODE_EVERYTHING returns NULLs that are not rechecked. Maybe
> it would be better to introduce new GIN_SEARCH_MODE_EVERYTHING_NON_NULL.

Also

+ if (searchMode == GIN_SEARCH_MODE_ALL && nQueryValues <= 0)
+ {
+ /*
+ * Don't emit ALL key with no entries, check only whether
+ * unconditional recheck is needed.
+ */
+ GinScanKey key = &so->keys[--so->nkeys];
+
+ hasSearchAllMode = true;
+ so->forcedRecheck = key->triConsistentFn(key) != GIN_TRUE;
+ }

Shouldn't you make sure that the forcedRecheck flag can't reset?

> -- patched
> EXPLAIN ANALYZE SELECT * FROM test WHERE t LIKE '%1234%' AND t LIKE '%1%';
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------------------------
> Bitmap Heap Scan on test (cost=20.43..176.79 rows=42 width=6) (actual time=0.287..0.424 rows=300 loops=1)
> Recheck Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> Rows Removed by Index Recheck: 2
> Heap Blocks: exact=114
> -> Bitmap Index Scan on test_t_idx (cost=0.00..20.42 rows=42 width=0) (actual time=0.271..0.271 rows=302 loops=1)
> Index Cond: ((t ~~ '%1234%'::text) AND (t ~~ '%1%'::text))
> Planning Time: 0.080 ms
> Execution Time: 0.450 ms
> (8 rows)

One thing that's bothering me is that the explain implies that the
LIKE '%i% was part of the index scan, while in reality it wasn't. One
of the reason why I tried to modify the qual while generating the path
was to have the explain be clearer about what is really done.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2019-06-29 09:55:30 Re: Commitfest 2019-07, the first of five* for PostgreSQL 13
Previous Message Masahiko Sawada 2019-06-29 08:55:56 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)