Re: Fixing GIN for empty/null/full-scan cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Fixing GIN for empty/null/full-scan cases
Date: 2011-01-07 00:31:20
Message-ID: 5057.1294360280@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a WIP patch along the lines I suggested earlier: this covers
the core code and docs, but doesn't yet do anything to the various
opclass-specific functions. Testing it, I find that there was a gap in
my previous thinking. The patch works for cases like

select ... where arraycol = '{}'

which previously failed for lack of ability to do a full-index scan.
However, it doesn't work for cases like

select ... where arraycol <@ '{1,2}'

This correctly returns {1}, {2}, and {1,2} --- but it doesn't find {}.
There are empty-item placeholder entries in the index for the latter,
but since the query does contain some real keys, it's not a full-index
scan and thus it doesn't look for the empty-item placeholders.

There are basically two ways we could fix this: with or without the
extractQuery function's cooperation. Without any help from
extractQuery, it would be necessary for *every* GIN query to include
empty-item placeholders in the search and then see if the consistentFn
would succeed on those rows. That seems pretty unattractive from an
efficiency standpoint, not to mention that it opens the possibility
I mentioned before of old consistentFns assuming without checking
that there's at least one true check[] entry.

What seems like a better idea is for extractQuery to explicitly tell
us whether or not an empty indexed item can match the query. In the
terms of this patch, that amounts to including a GIN_CAT_EMPTY_ITEM
placeholder as one of the search targets. Now, there are two ways
we could do that, too:

1. Explicitly include the EMPTY_ITEM target as part of the Datum[] array
returned by extractQuery. That seems a bit grotty on a couple of
grounds, one being that we couldn't use just a bool isNull[] auxiliary
array but would have to expose GinNullCategory or something similar to
the opclasses. Another problem with it is that then the "empty" target
would presumably be included in the check[] array, with the result that
an "empty" heap item would have one check[] value set, which would mess
up simple AND/OR combination logic such as you see in
ginarrayconsistent().

2. Add another output bool parameter to extractQuery that it must set
true (from a default false state) if the query could match with no check
values set. This would prompt the GIN code to search for EMPTY_ITEM
placeholders, but they'd not be part of the check[] array.

With either one of these approaches, we can avoid the
backwards-compatibility problem of a consistentFn being passed cases
that it's not expecting, because it's not going to see any such cases
without an update to its cohort extractQueryFn. (In particular, if the
extractQueryFn returns zero keys and doesn't take an additional action
to say that an empty item can match, then we'll treat that as an
unsatisfiable query instead of calling a consistentFn that might
misbehave on such a case.)

I think I like option #2 better. Comments?

regards, tom lane

Attachment Content-Type Size
gin-nulls-1.patch.gz application/octet-stream 48.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Cédric Villemain 2011-01-07 00:47:39 Re: Streaming base backups
Previous Message Magnus Hagander 2011-01-06 23:01:52 Re: Streaming base backups