Re: Avoid full GIN index scan when possible

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Avoid full GIN index scan when possible
Date: 2019-08-01 19:15:22
Message-ID: CAPpHfdvfK+Zk=MjjC1M_66+ALD+vdR6BG3quoFAZzjL6hNv-NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 1, 2019 at 9:59 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Julien Rouhaud <rjuju123(at)gmail(dot)com> writes:
> > On Thu, Aug 1, 2019 at 4:37 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Eyeing this a bit further ... doesn't scanPendingInsert also need
> >> to honor so->forcedRecheck? Something along the lines of
>
> > I think so.
>
> Yeah, it does --- the updated pg_trgm test attached fails if it doesn't.
>
> Also, I found that Alexander's concern upthread:
>
> >> What would happen when two-columns index have GIN_SEARCH_MODE_DEFAULT
> >> scan on first column and GIN_SEARCH_MODE_ALL on second? I think even
> >> if triconsistent() for second column returns GIN_TRUE, we still need
> >> to recheck to verify second columns is not NULL.
>
> is entirely on-point. This patch generates the wrong answer in the
> case I added to gin.sql below. (The expected output was generated
> with HEAD and seems correct, but with these code changes, we incorrectly
> report the row with NULL as matching. So I expect the cfbot is going
> to complain about the patch in this state.)
>
> While I've not attempted to fix that here, I wonder whether we shouldn't
> fix it by just forcing forcedRecheck to true in any case where we discard
> an ALL qualifier. That would get rid of all the ugliness around
> ginFillScanKey, which I'd otherwise really want to refactor to avoid
> this business of adding and then removing a scan key. It would also
> get rid of the bit about "XXX Need to use ALL mode instead of EVERYTHING
> to skip NULLs if ALL mode has been seen", which aside from being ugly
> seems to be dead wrong for multi-column-index cases.

+1 for setting forcedRecheck in any case we discard ALL qualifier.
ISTM, real life number of cases we can skip recheck here is
negligible. And it doesn't justify complexity.

> BTW, it's not particularly the fault of this patch, but: what does it
> even mean to specify GIN_SEARCH_MODE_ALL with a nonzero number of keys?

It might mean we would like to see all the results, which don't
contain given key.

> Should we decide to treat that as an error? It doesn't look to me like
> any of the in-tree opclasses will return such a case, and I'm not at
> all convinced what the GIN scan code would actually do with it, except
> that I doubt it matches the documentation.

I think tsvector_ops behaves so. See gin_extract_tsquery().

/*
* If the query doesn't have any required positive matches (for
* instance, it's something like '! foo'), we have to do a full index
* scan.
*/
if (tsquery_requires_match(item))
*searchMode = GIN_SEARCH_MODE_DEFAULT;
else
*searchMode = GIN_SEARCH_MODE_ALL;

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-08-01 19:28:43 Re: Avoid full GIN index scan when possible
Previous Message Tom Lane 2019-08-01 19:13:23 Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)