Re: Avoid full GIN index scan when possible

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Marc Cousin <cousinmarc(at)gmail(dot)com>
Subject: Re: Avoid full GIN index scan when possible
Date: 2019-06-28 22:23:13
Message-ID: CAOBaU_bDF-7E5x0WVPmmtoc+0z7S3hmcXnnV+1ggy8CMGyZxgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 28, 2019 at 10:16 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
> > On Fri, Jun 28, 2019 at 03:03:19PM -0400, Tom Lane wrote:
> >> I not only don't want that function in indxpath.c, I don't even want
> >> it to be known/called from there. If we need the ability for the index
> >> AM to editorialize on the list of indexable quals (which I'm not very
> >> convinced of yet), let's make an AM interface function to do it.
>
> > Wouldn't it be better to have a function that inspects a single qual and
> > says whether it's "optimizable" or not? That could be part of the AM
> > implementation, and we'd call it and it'd be us messing with the list.
>
> Uh ... we already determined that the qual is indexable (ie is a member
> of the index's opclass), or allowed the index AM to derive an indexable
> clause from it, so I'm not sure what you envision would happen
> additionally there. If I understand what Julien is concerned about
> --- and I may not --- it's that the set of indexable clauses *as a whole*
> may have or lack properties of interest. So I'm thinking the answer
> involves some callback that can do something to the whole list, not
> qual-at-a-time. We've already got facilities for the latter case.

Yes, the root issue here is that with gin it's entirely possible that
"WHERE sometable.col op value1" is way more efficient than "WHERE
sometable.col op value AND sometable.col op value2", where both qual
are determined indexable by the opclass. The only way to avoid that
is indeed to inspect the whole list, as done in this poor POC.

This is a problem actually hit in production, and as far as I know
there's no easy way from the application POV to prevent unexpected
slowdown. Maybe Marc will have more details about the actual problem
and how expensive such a case was compared to the normal ones.

> > But that kinda resembles stuff we already have - selectivity/cost. So
> > why shouldn't this be considered as part of costing?
>
> Yeah, I'm not entirely convinced that we need anything new here.
> The cost estimate function can detect such situations, and so can
> the index AM at scan start --- for example, btree checks for
> contradictory quals at scan start. There's a certain amount of
> duplicative effort involved there perhaps, but you also have to
> keep in mind that we don't know the values of run-time-determined
> comparison values until scan start. So if you want certainty rather
> than just a cost estimate, you may have to do these sorts of checks
> at scan start.

Ah, I didn't know about _bt_preprocess_keys(). I'm not familiar with
this code, so please bear with me. IIUC the idea would be to add
additional logic in gingetbitmap() / ginNewScanKey() to drop some
quals at runtime. But that would mean that additional logic would
also be required in BitmapHeapScan, or that all the returned bitmap
should be artificially marked as lossy to enforce a recheck?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-06-28 22:24:03 Re: [PATCH] Implement uuid_version()
Previous Message Alvaro Herrera 2019-06-28 22:10:49 Re: BUG #15383: Join Filter cost estimation problem in 10.5