Re: Avoid full GIN index scan when possible

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2020-01-17 21:33:14
Message-ID: CAPpHfdtvH7EtVJ_J-LMV-YieHc93mdYANdP5bUL7MwVQwMAFVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 15, 2020 at 2:03 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> > On Tue, Jan 14, 2020 at 9:43 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> One thing I'm still not happy about is the comment in
> >> collectMatchesForHeapRow. v12 failed to touch that at all, so I tried to
> >> fill it in, but I'm not sure if my explanation is good.
>
> > I've tried to rephrase this comment making it better from my point of
> > view. It's hard for me to be sure about this, since I'm not native
> > English speaker. I'd like you to take a look on it.
>
> Yeah, that's not great as-is. Maybe like
>
> + * All scan keys except excludeOnly require at least one entry to match.
> + * excludeOnly keys are an exception, because their implied
> + * GIN_CAT_EMPTY_QUERY scanEntry always matches. So return "true"
> + * if all non-excludeOnly scan keys have at least one match.

Looks good to me.

> >> Also, if we know
> >> that excludeOnly keys are going to be ignored, can we save any work in
> >> the main loop of that function?
>
> > It doesn't look so for me. We still need to collect matches for
> > consistent function call afterwards.
>
> Ah, right.
>
> > I also had concerns about how excludeOnly keys work with lossy pages.
> > I didn't find exact error. But I've added code, which skips
> > excludeOnly keys checks for lossy pages. They aren't going to exclude
> > any lossy page anyway. So, we can save some resources by skipping
> > this.
>
> Hmm ... yeah, these test cases are not large enough to exercise any
> lossy-page cases, are they? I doubt we should try to make a new regression
> test that is that big. (But if there is one already, maybe we could add
> more test queries with it, instead of creating whole new tables?)

I've checked that none of existing tests for GIN can produce lossy
bitmap page with minimal work_mem = '64kB'. I've tried to generate
sample table with single integer column to get lossy page. It appears
that we need at least 231425 rows to get it. With wider rows, we
would need less number of rows, but I think total heap size wouldn't
be less.

So, I think we don't need so huge regression test to exercise this corner case.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v15.patch application/octet-stream 32.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-01-17 21:33:48 Re: making the backend's json parser work in frontend code
Previous Message Robert Haas 2020-01-17 21:24:21 Re: making the backend's json parser work in frontend code