Re: Strange behavior with polygon and NaN

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bruce(at)momjian(dot)us
Cc: sbjesse(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Strange behavior with polygon and NaN
Date: 2020-08-26 08:25:25
Message-ID: 20200826.172525.155200749215780647.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 25 Aug 2020 19:03:50 -0400, Bruce Momjian <bruce(at)momjian(dot)us> wrote in
>
> I can confirm that this two-month old email report still produces
> different results with indexes on/off in git master, which I don't think
> is ever correct behavior.

I agree to that the behavior is broken.

> ---------------------------------------------------------------------------
>
> On Wed, Jun 24, 2020 at 03:11:03PM -0700, Jesse Zhang wrote:
> > Hi hackers,
> >
> > While working with Chris Hajas on merging Postgres 12 with Greenplum
> > Database we stumbled upon the following strange behavior in the geometry
> > type polygon:
> >
> > ------ >8 --------
> >
> > CREATE TEMP TABLE foo (p point);
> > CREATE INDEX ON foo USING gist(p);
> >
> > INSERT INTO foo VALUES ('0,0'), ('1,1'), ('NaN,NaN');
> >
> > SELECT $q$
> > SELECT * FROM foo WHERE p <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> > $q$ AS qry \gset
> >
> > BEGIN;
> > SAVEPOINT yolo;
> > SET LOCAL enable_seqscan TO off;
> > :qry;
> >
> > ROLLBACK TO SAVEPOINT yolo;
> > SET LOCAL enable_indexscan TO off;
> > SET LOCAL enable_bitmapscan TO off;
> > :qry;
> >
> > ------ 8< --------
> >
> > If you run the above repro SQL in HEAD (and 12, and likely all older
> > versions), you get the following output:
> >
> > CREATE TABLE
> > CREATE INDEX
> > INSERT 0 3
> > BEGIN
> > SAVEPOINT
> > SET
> > p
> > -------
> > (0,0)
> > (1,1)
> > (2 rows)
> >
> > ROLLBACK
> > SET
> > SET
> > p
> > -----------
> > (0,0)
> > (1,1)
> > (NaN,NaN)
> > (3 rows)
> >
> >
> > At first glance, you'd think this is the gist AM's bad, but on a second
> > thought, something else is strange here. The following query returns
> > true:
> >
> > SELECT point '(NaN, NaN)' <@ polygon '(0,0), (0, 100), (100, 100), (100, 0)'
> >
> > The above behavior of the "contained in" operator is surprising, and
> > it's probably not what the GiST AM is expecting. I took a look at
> > point_inside() in geo_ops.c, and it doesn't seem well equipped to handle
> > NaN. Similary ill-equipped is dist_ppoly_internal() which underlies the
> > distnace operator for polygon. It gives the following interesting
> > output:
> >
> > SELECT *, c <-> polygon '(0,0),(0,100),(100,100),(100,0)' as distance
> > FROM (
> > SELECT circle(point(100 * i, 'NaN'), 50) AS c
> > FROM generate_series(-2, 4) i
> > ) t(c)
> > ORDER BY 2;
> >
> > c | distance
> > -----------------+----------
> > <(-200,NaN),50> | 0
> > <(-100,NaN),50> | 0
> > <(0,NaN),50> | 0
> > <(100,NaN),50> | 0
> > <(200,NaN),50> | NaN
> > <(300,NaN),50> | NaN
> > <(400,NaN),50> | NaN
> > (7 rows)
> >
> > Should they all be NaN? Am I alone in thinking the index is right but
> > the operators are wrong? Or should we call the indexes wrong here?

There may be other places that NaN is not fully considered. For
example, the following (perpendicular op) returns true.

SELECT lseg '[(0,0),(0,NaN)]' ?-| lseg '[(0,0),(10,0)]';

It is quite hard to fix all of the defect..

For the above cases, it's enough to make sure that point_inside don't
pass NaN's to lseg_crossing(), but it's much of labor to fill all
holes reasonable and principled way..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
point_inside_fix.patch text/x-patch 826 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-08-26 08:45:54 Re: Problems with the FSM, heap fillfactor, and temporal locality
Previous Message Andy Fan 2020-08-26 08:21:35 Re: Improve planner cost estimations for alternative subplans