Re: Incorrect behaviour when using a GiST index on points

From: Noah Misch <noah(at)leadboat(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incorrect behaviour when using a GiST index on points
Date: 2012-10-18 19:18:28
Message-ID: 20121018191828.GB10844@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 11, 2012 at 07:17:28AM -0400, Noah Misch wrote:
> On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote:
> > > > On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > >> There's also the big-picture question of whether we should just get rid
> > > >> of fuzzy comparisons in the geometric types instead of trying to hack
> > > >> indexes to work around them.
>
> > In any event, I think we should entertain a patch to make the GiST operator
> > class methods bug-compatible with corresponding operators. Even if we decide
> > to change operator behavior in HEAD, the back branches could use it.
>
> We have broad agreement that the specific implementation of fuzz in geometric
> comparison operators is shoddy, but nobody has voiced interest in designing a
> concrete improvement. I propose adding a TODO item "Remove or improve
> rounding in geometric comparison operators", endorsing Alexander's design, and
> reviewing his patch. Objections?

TODO added, and here's a review:

The patch adds no regression tests; it should add tests illustrating the
problems it fixes.

I audited the other indexable geometric operators for similar problems. This
passage in gist_point_consistent_internal(), which handles (point,point)
operators, caught my suspicion:

case RTSameStrategyNumber:
if (isLeaf)
{
result = FPeq(key->low.x, query->x)
&& FPeq(key->low.y, query->y);
}
else
{
result = (query->x <= key->high.x && query->x >= key->low.x &&
query->y <= key->high.y && query->y >= key->low.y);
}
break;

A leaf entry reachable from an internal entry may fall exactly on the
internal-entry bounding box. Since we would accept a fuzzy match at the leaf
level, I think we must also accept a fuzzy match at the internal level.

> *** a/src/backend/access/gist/gistproc.c
> --- b/src/backend/access/gist/gistproc.c

> ***************
> *** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
> --- 1327,1333 ----
> bool *recheck = (bool *) PG_GETARG_POINTER(4);
> bool result;
> StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
> + BOX *query, *key;

This function now has "query" variables within subsidiary blocks redundant
with and masking this one. Avoid doing that.

>
> switch (strategyGroup)
> {
> ***************
> *** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
> *recheck = false;
> break;
> case BoxStrategyNumberGroup:
> ! result = DatumGetBool(DirectFunctionCall5(
> ! gist_box_consistent,
> ! PointerGetDatum(entry),
> ! PG_GETARG_DATUM(1),
> ! Int16GetDatum(RTOverlapStrategyNumber),
> ! 0, PointerGetDatum(recheck)));
> break;
> case PolygonStrategyNumberGroup:
> {
> --- 1339,1356 ----
> *recheck = false;
> break;
> case BoxStrategyNumberGroup:
> ! /*
> ! * This code repeats logic of on_ob which uses simple comparison
> ! * rather than FP* functions.
> ! */
> ! query = PG_GETARG_BOX_P(1);
> ! key = DatumGetBoxP(entry->key);
> !
> ! *recheck = false;
> ! result = key->high.x >= query->low.x &&
> ! key->low.x <= query->high.x &&
> ! key->high.y >= query->low.y &&
> ! key->low.y <= query->high.y;

For leaf entries, this correctly degenerates to on_pb(). For internal
entries, it must, but does not, implement box_overlap(). (The fuzzy
box_overlap() would be fine.) I recommend making gist_point_consistent()'s
treatment of boxes resemble its treatment of circles and polygons; that eases
verifying their correctness. Call gist_box_consistent. Then, for leaf
entries, call box_contain_pt().

GiST "consistent" functions often validate the strategy number, but the
circle, polygon and box branches of gist_point_consistent silently assume
strategy % GeoStrategyNumberOffset == RTContainedByStrategyNumber. Should
they verify that assumption? I raise this as an incidental question; it is
not new with your patch.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-10-18 19:36:26 Re: pg_stat_lwlocks view - lwlocks statistics, round 2
Previous Message Christopher Browne 2012-10-18 19:18:19 Re: [RFC] CREATE QUEUE (log-only table) for londiste/pgQ ccompatibility