Re: Fixing GIN for empty/null/full-scan cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Fixing GIN for empty/null/full-scan cases
Date: 2011-01-07 16:07:48
Message-ID: 20044.1294416468@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> 2. Add another output bool parameter to extractQuery that it must set
> true (from a default false state) if the query could match with no check
> values set. This would prompt the GIN code to search for EMPTY_ITEM
> placeholders, but they'd not be part of the check[] array.

On further reflection: if we're going to go this route, we really ought
to take one more step and allow the opclass to demand a full-index scan.
The reason for this is cases like tsvector's NOT operator:

SELECT ... WHERE tsvectorcol @@ '! unwanted'::tsquery

Right now, this will do what it says on the tin if implemented as a
seqscan. It will fail (silently, I think) if implemented as a GIN index
search. We didn't use to have any way of making it behave sanely as
an indexsearch, but the mechanisms I'm building now would support doing
this right.

So, instead of just a bool, I'm now proposing adding an int return
argument specified like this:

searchMode is an output argument that allows extractQuery to specify
details about how the search will be done. If *searchMode is set to
GIN_SEARCH_MODE_DEFAULT (which is the value it is initialized to
before call), only items that match at least one of the returned
keys are considered candidate matches. If *searchMode is set to
GIN_SEARCH_MODE_INCLUDE_EMPTY, then in addition to items containing
at least one matching key, items that contain no keys at all are
considered candidate matches. (This mode is useful for implementing
is-subset-of operators, for example.) If *searchMode is set to
GIN_SEARCH_MODE_ALL, then all non-null items in the index are
considered candidate matches, whether they match any of the returned
keys or not. (This mode is much slower than the other two choices,
since it requires scanning essentially the entire index, but it may
be necessary to implement corner cases correctly. An operator that
needs this mode in most cases is probably not a good candidate for a
GIN operator class.) The symbols to use for setting this mode are
defined in access/gin.h.

The default mode is equivalent to what used to happen implicitly, so
this is still backwards-compatible with existing opclasses.

Don't have code to back up this spec yet, but I believe I see how to do
it.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-01-07 16:09:03 Re: Re: [COMMITTERS] pgsql: New system view pg_stat_replication displays activity of wal sen
Previous Message David Fetter 2011-01-07 16:05:46 Re: LOCK for non-tables