Re: [PATCH]: fix bug in SP-GiST box_ops

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH]: fix bug in SP-GiST box_ops
Date: 2017-01-30 18:12:31
Message-ID: 4450e7a6-01e7-0fb2-a01e-98fb5405d217@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30.01.2017 12:04, Kyotaro HORIGUCHI wrote:
> Hello,
>
> At Mon, 30 Jan 2017 07:12:03 +0300, Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru> wrote
> in <9ea5b157-478c-8874-bc9b-050076b7d042(at)postgrespro(dot)ru>:
>> Working on the tests for new SP-GiST opclasses for polygons and
>> circles, I've found a bug in the SP-GiST box_ops (added in 9.6): some operators
>> (&<, &>, $<|, |&>) have wrong tests in spg_box_quad_inner_consistent().
>> This obviously leads to incorrect results of a SP-GiST index scan (see tests
>> in the attached patch, their results were taken from a sequential scan).
>
> Your problem is not necessarily evident for others. Perhaps you
> have to provide an explanation and/or a test case that describes
> what is wrong for you so that others can get a clue for this
> problem. Simpler test is better.

The problem is that there are mixed low/high values for x/y axes. For example,
overLeft4D() compares x-RangeBox rect_box->range_box_x with y-Range &query->right,
and also lower2D() here must use query->high instead of query->low.

The corresponding test is very simple: insert 10000 nearly arbitrary boxes and
see the difference between results of sequential scan and results of index scan.

regression=# CREATE TEMPORARY TABLE quad_box_tbl (b box);

regression=# INSERT INTO quad_box_tbl
SELECT box(point(x * 10, y * 10), point(x * 10 + 5, y * 10 + 5))
FROM generate_series(1, 100) x,
generate_series(1, 100) y;

regression=# CREATE INDEX quad_box_tbl_idx ON quad_box_tbl USING spgist(b);

regression=# SET enable_seqscan = ON;
regression=# SET enable_indexscan = OFF;
regression=# SET enable_bitmapscan = OFF;

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &< box '((100,200),(300,500))';
count
-------
2900
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &> box '((100,200),(300,500))';
count
-------
9100
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box '((100,200),(300,500))';
count
-------
4900
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box '((100,200),(300,500))';
count
-------
8100
(1 row)

regression=# SET enable_seqscan = OFF;
regression=# SET enable_indexscan = ON;
regression=# SET enable_bitmapscan = ON;

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &< box '((100,200),(300,500))';
count
-------
2430
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &> box '((100,200),(300,500))';
count
-------
6208
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b &<| box '((100,200),(300,500))';
count
-------
1048
(1 row)

regression=# SELECT count(*) FROM quad_box_tbl WHERE b |&> box '((100,200),(300,500))';
count
-------
5084
(1 row)

> The test,
>
> | +INSERT INTO quad_box_tbl
> | + SELECT i, box(point(x, y), point(x + w, y + h))
> | + FROM (SELECT i,
> | + random() * 1000 as x, random() * 1000 as y,
> | + random() * 20 as w, random() * 20 as h
>
> is inserting quad_boxes generated using random numbers then,
>
> | +SELECT count(*) FROM quad_box_tbl WHERE b << box '((100,200),(300,500))';
> | + count
> | +-------
> | + 891
>
> counting them in this way is unstable. Even though this were
> stable due to a fixed initial, this would be unacceptable, I
> think. This kind of test should use nonrandom numbers.

I have replaced random data in this test with stable uniformly distributed data.
I would also like to use existing box data from rect.data that is loaded to
slow_emp4000 table in copy.sql test, but box.sql test is executed before copy.sql.

> Even though I don't understand this in depth, the following
> change seems somewhat wrong in direction. Changing the second
> argument type seems breaking the basis of the design.
>
> | -lower2D(RangeBox *range_box, Range *query)
> | +lower2D(RangeBox *range_box, double query)

Maybe. I also thought that these changes are quite doubtful, so I decided to
replace lowerEqual2D()/higherEqual2D() with overlower2D()/overhigher2D() and
not to touch lower2D()/higher2D().

--
Nikita Glukhov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
fix-bug-in-SP-GiST-box_ops-v02.patch text/x-patch 7.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-01-30 18:17:42 Re: Improvements in psql hooks for variables
Previous Message Alvaro Herrera 2017-01-30 18:10:58 Re: pgsql: test_pg_dump TAP test whitespace cleanup