Re: Avoid full GIN index scan when possible

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid full GIN index scan when possible
Date: 2019-08-01 18:59:00
Message-ID: 17590.1564685940@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
> On Thu, Aug 1, 2019 at 4:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Eyeing this a bit further ... doesn't scanPendingInsert also need
>> to honor so->forcedRecheck? Something along the lines of

> I think so.

Yeah, it does --- the updated pg_trgm test attached fails if it doesn't.

Also, I found that Alexander's concern upthread:

>> What would happen when two-columns index have GIN_SEARCH_MODE_DEFAULT
>> scan on first column and GIN_SEARCH_MODE_ALL on second? I think even
>> if triconsistent() for second column returns GIN_TRUE, we still need
>> to recheck to verify second columns is not NULL.

is entirely on-point. This patch generates the wrong answer in the
case I added to gin.sql below. (The expected output was generated
with HEAD and seems correct, but with these code changes, we incorrectly
report the row with NULL as matching. So I expect the cfbot is going
to complain about the patch in this state.)

While I've not attempted to fix that here, I wonder whether we shouldn't
fix it by just forcing forcedRecheck to true in any case where we discard
an ALL qualifier. That would get rid of all the ugliness around
ginFillScanKey, which I'd otherwise really want to refactor to avoid
this business of adding and then removing a scan key. It would also
get rid of the bit about "XXX Need to use ALL mode instead of EVERYTHING
to skip NULLs if ALL mode has been seen", which aside from being ugly
seems to be dead wrong for multi-column-index cases.

BTW, it's not particularly the fault of this patch, but: what does it
even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys?
Should we decide to treat that as an error? It doesn't look to me like
any of the in-tree opclasses will return such a case, and I'm not at
all convinced what the GIN scan code would actually do with it, except
that I doubt it matches the documentation.

Setting this back to Waiting on Author.

regards, tom lane

Attachment Content-Type Size
avoid_gin_fullscan-v5.patch text/x-diff 11.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-01 19:13:23 Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)
Previous Message Corey Huinker 2019-08-01 18:57:19 Re: \describe*