Re: GIN versus zero-key queries

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GIN versus zero-key queries
Date: 2009-04-15 23:44:56
Message-ID: 200904152344.n3FNiuH05971@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Where are we on this?

---------------------------------------------------------------------------

Tom Lane wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> >> We have an API definition by which extractQuery can distinguish "all
> >> match" from "no match". If we just legislate that "some match" isn't
> >> a valid behavior for zero-key queries, then the code is correct and the
>
> > Right now I don't see an example with zero keys and "some match", consistent
> > method will not have any information about indexed tuple and hence it could not
> > produce any reasonable result. It seems to me, that paragraph should be removed
> > at all.
>
> Well, we still have to document the fact that GIN fails when presented
> with a query that would require a full-index scan. I've updated section
> 52.5 as attached. (I removed the claim that multiple matches were a
> problem, since this is obviously not true for a bitmap scan, and I
> suppose that the plain indexscan code must have had a way to deal with
> it too.)
>
> More generally, though, I find myself quite unhappy with the fact that
> GIN doesn't provide reasonable behavior for the no-keys corner cases.
> If we are going to provide operator classes that claim to implement
> index acceleration of standard operators, it is not okay for them
> to not match the exact semantics of those operators. Right now it's
> a mess for empty arrays, and it's even more of a mess for arrays
> containing nulls. array_contain_compare() considers nulls as never
> matching, which means that
>
> regression=# select array[1,null] <@ array[1,null];
> ?column?
> ----------
> f
> (1 row)
>
> which is entirely bizarre when you compare that result to
>
> regression=# select array[1,null] = array[1,null];
> ?column?
> ----------
> t
> (1 row)
>
> It's obviously too late to do anything about this for 8.4, but I would
> like to have a TODO item to figure out how to do this right. We need to
> adjust the behavior of the operators to be consistent, and then we need
> to make it possible for GIN to implement them exactly. I wonder whether
> we should not change GIN to automatically do something reasonable for
> empty indexed values, ie stick them into the index in some special way
> denoting "no indexable keys for this item". The dummy-value solution
> is not something that operator classes should have to come up with,
> and not all data types present a reasonable way to have dummy values
> anyway.
>
> regards, tom lane
>
>
> <para>
> <acronym>GIN</acronym> doesn't support full index scans. The reason for
> this is that <function>extractValue</> is allowed to return zero keys,
> as for example might happen with an empty string or empty array. In such
> a case the indexed value will be unrepresented in the index. It is
> therefore impossible for <acronym>GIN</acronym> to guarantee that a
> scan of the index can find every row in the table.
> </para>
>
> <para>
> Because of this limitation, when <function>extractQuery</function> returns
> <literal>nkeys = 0</> to indicate that all values match the query,
> <acronym>GIN</acronym> will emit an error. (If there are multiple ANDed
> indexable operators in the query, this happens only if they all return zero
> for <literal>nkeys</>.)
> </para>
>
> <para>
> It is possible for an operator class to circumvent the restriction against
> full index scan. To do that, <function>extractValue</> must return at least
> one (possibly dummy) key for every indexed value, and
> <function>extractQuery</function> must convert an unrestricted search into
> a partial-match query that will scan the whole index. This is inefficient
> but might be necessary to avoid corner-case failures with operators such
> as <literal>LIKE</> or subset inclusion.
> </para>
>
> <para>
> <acronym>GIN</acronym> assumes that indexable operators are strict.
> This means that <function>extractValue</> will not be called at all on
> a NULL value (so the value will go unindexed), and
> <function>extractQuery</function> will not be called on a NULL comparison
> value either (instead, the query is presumed to be unmatchable).
> </para>
>
> <para>
> A possibly more serious limitation is that <acronym>GIN</acronym> cannot
> handle NULL keys &mdash; for example, an array containing a NULL cannot
> be handled except by ignoring the NULL.
> </para>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2009-04-16 00:46:11 Re: psql with "Function Type" in \df
Previous Message Bruce Momjian 2009-04-15 23:43:56 Re: libpq is not thread safe