Re: Incorrect behaviour when using a GiST index on points

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Incorrect behaviour when using a GiST index on points
Date: 2012-11-02 12:05:30
Message-ID: CAPpHfduzPjwO-bUkJdniQ-K-c3x-uh9ZhtFf1PLr_6hKstn7fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 18, 2012 at 11:18 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:

> On Thu, Oct 11, 2012 at 07:17:28AM -0400, Noah Misch wrote:
> > On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote:
> > > > > On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> wrote:
> > > > >> There's also the big-picture question of whether we should just
> get rid
> > > > >> of fuzzy comparisons in the geometric types instead of trying to
> hack
> > > > >> indexes to work around them.
> >
> > > In any event, I think we should entertain a patch to make the GiST
> operator
> > > class methods bug-compatible with corresponding operators. Even if we
> decide
> > > to change operator behavior in HEAD, the back branches could use it.
> >
> > We have broad agreement that the specific implementation of fuzz in
> geometric
> > comparison operators is shoddy, but nobody has voiced interest in
> designing a
> > concrete improvement. I propose adding a TODO item "Remove or improve
> > rounding in geometric comparison operators", endorsing Alexander's
> design, and
> > reviewing his patch. Objections?
>
> TODO added, and here's a review:
>
> The patch adds no regression tests; it should add tests illustrating the
> problems it fixes.
>

I've added some tests to points.sql.

> I audited the other indexable geometric operators for similar problems.
> This
> passage in gist_point_consistent_internal(), which handles (point,point)
> operators, caught my suspicion:
>
> case RTSameStrategyNumber:
> if (isLeaf)
> {
> result = FPeq(key->low.x, query->x)
> && FPeq(key->low.y, query->y);
> }
> else
> {
> result = (query->x <= key->high.x &&
> query->x >= key->low.x &&
> query->y <= key->high.y
> && query->y >= key->low.y);
> }
> break;
>
> A leaf entry reachable from an internal entry may fall exactly on the
> internal-entry bounding box. Since we would accept a fuzzy match at the
> leaf
> level, I think we must also accept a fuzzy match at the internal level.

Good catch, fixed.

> > *** a/src/backend/access/gist/gistproc.c
> > --- b/src/backend/access/gist/gistproc.c
>
> > ***************
> > *** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS)
> > --- 1327,1333 ----
> > bool *recheck = (bool *) PG_GETARG_POINTER(4);
> > bool result;
> > StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
> > + BOX *query, *key;
>
> This function now has "query" variables within subsidiary blocks redundant
> with and masking this one. Avoid doing that.

Fixed.

>
> > switch (strategyGroup)
> > {
> > ***************
> > *** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS)
> > *recheck = false;
> > break;
> > case BoxStrategyNumberGroup:
> > ! result = DatumGetBool(DirectFunctionCall5(
> > !
> gist_box_consistent,
> > !
> PointerGetDatum(entry),
> > !
> PG_GETARG_DATUM(1),
> > !
> Int16GetDatum(RTOverlapStrategyNumber),
> > !
> 0, PointerGetDatum(recheck)));
> > break;
> > case PolygonStrategyNumberGroup:
> > {
> > --- 1339,1356 ----
> > *recheck = false;
> > break;
> > case BoxStrategyNumberGroup:
> > ! /*
> > ! * This code repeats logic of on_ob which uses
> simple comparison
> > ! * rather than FP* functions.
> > ! */
> > ! query = PG_GETARG_BOX_P(1);
> > ! key = DatumGetBoxP(entry->key);
> > !
> > ! *recheck = false;
> > ! result = key->high.x >= query->low.x &&
> > ! key->low.x <= query->high.x &&
> > ! key->high.y >= query->low.y &&
> > ! key->low.y <= query->high.y;
>
> For leaf entries, this correctly degenerates to on_pb(). For internal
> entries, it must, but does not, implement box_overlap(). (The fuzzy
> box_overlap() would be fine.) I recommend making gist_point_consistent()'s
> treatment of boxes resemble its treatment of circles and polygons; that
> eases
> verifying their correctness. Call gist_box_consistent. Then, for leaf
> entries, call box_contain_pt().
>

I have two objections on doing that:
1) It's not evident for me that fuzzy comparison in internal pages is fine.
Obviously, it depends on data distribution. It's easy to provide an example
when fuzzy comparison will lead to significant performance degradation.
2) With PolygonStrategyNumberGroup CircleStrategyNumberGroup it's faster to
do simple box comparison than doing calculation for exact circle and
especially polygon check. In this case previous filtering in leaf pages
looks reasonable. With BoxStrategyNumberGroup exact calculation is simpler
than gist_box_consistent.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gistproc_fix-2.patch application/octet-stream 6.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-11-02 12:46:41 Re: Incorrect behaviour when using a GiST index on points
Previous Message Pavan Deolasee 2012-11-02 12:02:16 Re: Problem Observed in behavior of Create Index Concurrently and Hot Update