Re: Avoid full GIN index scan when possible

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marc Cousin <cousinmarc(at)gmail(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2019-06-29 14:27:26
Message-ID: 20190629142726.gljead3qhbywahbx@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 29, 2019 at 03:28:11PM +0200, Julien Rouhaud wrote:
>On Sat, Jun 29, 2019 at 3:11 PM Tomas Vondra
><tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>
>> On Sat, Jun 29, 2019 at 02:50:51PM +0200, Julien Rouhaud wrote:
>> >On Sat, Jun 29, 2019 at 12:25 PM Tomas Vondra
>> >> A related issue is that during costing is too late to modify cardinality
>> >> estimates, so the 'Bitmap Index Scan' will be expected to return fewer
>> >> rows than it actually returns (after ignoring the full-scan quals).
>> >> Ignoring redundant quals (the way btree does it at execution) does not
>> >> have such consequence, of course.
>> >>
>> >> Which may be an issue, because we essentially want to modify the list of
>> >> quals to minimize the cost of
>> >>
>> >> bitmap index scan + recheck during bitmap heap scan
>> >>
>> >> OTOH it's not a huge issue, because it won't affect the rest of the plan
>> >> (because that uses the bitmap heap scan estimates, and those are not
>> >> affected by this).
>> >
>> >Doesn't this problem already exists, as the quals that we could drop
>> >can't actually reduce the node's results?
>>
>> How could it not reduce the node's results, if you ignore some quals
>> that are not redundant? My understanding is we have a plan like this:
>>
>> Bitmap Heap Scan
>> -> Bitmap Index Scan
>>
>> and by ignoring some quals at the index scan level, we trade the (high)
>> cost of evaluating the qual there for a plain recheck at the bitmap heap
>> scan. But it means the index scan may produce more rows, so it's only a
>> win if the "extra rechecks" are cheaper than the (removed) full scan.
>
>Sorry, by node I meant the BitmapIndexScan. AIUI, if you have for
>instance WHERE val LIKE '%abcde%' AND val LIKE '%z%' and a trgm index,
>the BitmapIndexScan will have to through the whole index and discard
>rows based on the only opclass-optimizable qual (LIKE '%abcde%'),
>letting the recheck do the proper filtering for the other qual. So
>whether you have the LIKE '%z%' or not in the index scan, the
>BitmapIndexScan will return the same number of rows, the only
>difference being that in one case you'll have to scan the whole index,
>while in the other case you won't.
>

Oh! I thought 'full scan' means we have to scan all the keys in the GIN
index, but we can still eliminate some of the keys (for example for the
trigrams we might check if the trigram contains the short string). But
clearly I was mistaken and it does not work like that ...

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message didier 2019-06-29 16:16:45 Re: Fix runtime errors from -fsanitize=undefined
Previous Message Tomas Vondra 2019-06-29 14:13:12 Re: mcvstats serialization code is still shy of a load