Re: Avoid full GIN index scan when possible

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2020-01-11 12:21:15
Message-ID: CAOBaU_YJadWvDVO3FQ_hddaTHSreWre6hPSNw=4wY6SskrvLuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Jan 11, 2020 at 1:10 AM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
> On Fri, Jan 10, 2020 at 7:36 PM Alexander Korotkov
> <a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> >
> > On Fri, Jan 10, 2020 at 6:31 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>
> >> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> >> > So, I think v10 is a version of patch, which can be committed after
> >> > some cleanup. And we can try doing better nulls handling in a separate
> >> > patch.
> >>
> >> The cfbot reports that this doesn't pass regression testing.
> >> I haven't looked into why not.
> >
> >
> > Thank you for noticing. I'll take care of it.
>
>
> In the v10 I've fixed a bug with nulls handling, but it appears that
> test contained wrong expected result. I've modified this test so that
> it directly compares sequential scan results with bitmap indexscan
> results.

Thanks a lot for working on that! I'm not very familiar with gin
internals so additional eyes are definitely needed here but I agree
that this approach is simpler and cleaner. I didn't find any problem
with the modified logic, the patch applies cleanly, compiles without
warning and all regression tests pass, so it all seems good to me.

Here are a few comments:

- In keyGetItem(), it seems that some comments would need to be
updated wrt. the new excludeOnly flag. I'm thinking of:

* Ok, we now know that there are no matches < minItem.
*
* If minItem is lossy, it means that there were no exact items on the
* page among requiredEntries, because lossy pointers sort after exact
* items. However, there might be exact items for the same page among
* additionalEntries, so we mustn't advance past them.

and

/*
* Normally, none of the items in additionalEntries can have a curItem
* larger than minItem. But if minItem is a lossy page, then there
* might be exact items on the same page among additionalEntries.
*/ if (ginCompareItemPointers(&entry->curItem, &minItem) < 0)
{
Assert(ItemPointerIsLossyPage(&minItem) || key->nrequired == 0);
minItem = entry->curItem;
}

While at it, IIUC only excludeOnly key can have nrequired == 0 (if
that's the case, this could be explicitly said in startScanKey
relevant comment), so it'd be more consistent to also use excludeOnly
rather than nrequired in this assert?

- the pg_trgm regression tests check for the number of rows returned
with the new "excludeOnly" permutations, but only with an indexscan,
should we make sure that the same results are returned with a seq
scan?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-01-11 12:25:01 Re: Amcheck: do rightlink verification with lock coupling
Previous Message Dent John 2020-01-11 12:04:05 Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR