Re: [PATCH]: fix bug in SP-GiST box_ops

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: n(dot)gluhov(at)postgrespro(dot)ru
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH]: fix bug in SP-GiST box_ops
Date: 2017-01-31 10:04:56
Message-ID: 20170131.190456.177913793.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, thank you for the revised patch.

The only comment from me is about comments on the new over*2D
funnctions.

At Mon, 30 Jan 2017 21:12:31 +0300, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> wrote in <4450e7a6-01e7-0fb2-a01e-98fb5405d217(at)postgrespro(dot)ru>
> On 30.01.2017 12:04, Kyotaro HORIGUCHI wrote:
> > Hello,
> >
> > At Mon, 30 Jan 2017 07:12:03 +0300, Nikita Glukhov
> > <n(dot)gluhov(at)postgrespro(dot)ru> wrote
> > in <9ea5b157-478c-8874-bc9b-050076b7d042(at)postgrespro(dot)ru>:
> >> Working on the tests for new SP-GiST opclasses for polygons and
> >> circles, I've found a bug in the SP-GiST box_ops (added in 9.6): some
> >> operators
> >> (&<, &>, $<|, |&>) have wrong tests in
> >> spg_box_quad_inner_consistent().
> >> This obviously leads to incorrect results of a SP-GiST index scan (see
> >> tests
> >> in the attached patch, their results were taken from a sequential
> >> scan).
> >
> > Your problem is not necessarily evident for others. Perhaps you
> > have to provide an explanation and/or a test case that describes
> > what is wrong for you so that others can get a clue for this
> > problem. Simpler test is better.
>
> The problem is that there are mixed low/high values for x/y axes. For
> example,
> overLeft4D() compares x-RangeBox rect_box->range_box_x with y-Range
> &query->right,
> and also lower2D() here must use query->high instead of query->low.
>
> The corresponding test is very simple: insert 10000 nearly arbitrary
> boxes and
> see the difference between results of sequential scan and results of
> index scan.
>
> regression=# CREATE TEMPORARY TABLE quad_box_tbl (b box);
>
> regression=# INSERT INTO quad_box_tbl
> SELECT box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
> FROM generate_series(1, 100) x,
> generate_series(1, 100) y;
>
> regression=# CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING
> spgist(b);

Thank you for the explanation and the test with fixed numbers.
I clearly understand what is the problem.

> regression=# SET enable_seqscan = ON;
> regression=# SET enable_indexscan = OFF;
> regression=# SET enable_bitmapscan = OFF;
>
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &< box
> '((100,200),(300,500))';
> count
> -------
> 2900
> (1 row)
>
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &> box
> '((100,200),(300,500))';
> count
> -------
> 9100
> (1 row)
>
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box
> '((100,200),(300,500))';
> count
> -------
> 4900
> (1 row)
>
> regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box
> '((100,200),(300,500))';
> count
> -------
> 8100
> (1 row)
>
> regression=# SET enable_seqscan = OFF;
> regression=# SET enable_indexscan = ON;
> regression=# SET enable_bitmapscan = ON;
(different result for the same queies)

The minimal bad example for the '&<' case is the following.

=# create table t (b box);
=# create index on t using spgist(b);
=# insert into t values ('((215, 305),(210,300))'::box);
=# select * from t where b &< '((100,200),(300,400))'::box;
b
---------------------
(215,305),(210,300)
(1 row)

=# set enable_seqscan to false;
=# select * from t where b &< '((100,200),(300,400))'::box;
b
---
(0 rows)

Index scan excludes the row and it obviously is not a difference
comes from error nor tolerance.

The current overLeft4D is equivalent to the following.

| FPlt(rect_box->range_box_x->left.high, query->right.high) &&
| FPlt(rect_box->range_box_x->right.high, query->right.high)

This is translated into the BOX context as the following.

| FPlt(range high(low.x), query->high.y) &&
| FPlt(range high(high.x), query->high.y)

Yes, it is obviously wrong as your report.

The code part of the v02 patch looks very good than the previous
one. It is exactly what I had in my mind. Thanks.

The following comment,

> /* Can any range from range_box to be overlower than this argument? */

This might be better to be using the same wording to its users,
for example the comment for overLeft4D is the following.

| /* Can any rectangle from rect_box does not extend the right of this argument? */

And the documentation uses the following sentsnce,

https://www.postgresql.org/docs/current/static/functions-geometry.html
| Does not extend to the right of?

So I think "Can any range from range_box does not extend the
upper of this argument?" is better than the overlower. What do
you think?

> > counting them in this way is unstable. Even though this were
> > stable due to a fixed initial, this would be unacceptable, I
> > think. This kind of test should use nonrandom numbers.
>
> I have replaced random data in this test with stable uniformly
> distributed data.
> I would also like to use existing box data from rect.data that is
> loaded to
> slow_emp4000 table in copy.sql test, but box.sql test is executed
> before copy.sql.

Nonrandom data is good. I'd like to compare the result of
corresnponding queries but unfortunately we don't have outer-join
on boxes. Anyway the test is enough since it detects this bug.

I confirmed other functions in geo_spgist but found no bugs like
this.

> > Even though I don't understand this in depth, the following
> > change seems somewhat wrong in direction. Changing the second
> > argument type seems breaking the basis of the design.
> >
> > | -lower2D(RangeBox *range_box, Range *query)
> > | +lower2D(RangeBox *range_box, double query)
>
> Maybe. I also thought that these changes are quite doubtful, so I
> decided to
> replace lowerEqual2D()/higherEqual2D() with
> overlower2D()/overhigher2D() and
> not to touch lower2D()/higher2D().

Thanks, this looks bery good.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-01-31 10:06:09 Re: Radix tree for character conversion
Previous Message Konstantin Knizhnik 2017-01-31 09:42:31 Re: logical decoding of two-phase transactions