Re: Bug in GiST paring heap comparator

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug in GiST paring heap comparator
Date: 2019-09-02 06:28:02
Message-ID: 7362fcac-3664-d960-f0fa-77c24e62a7b3@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/09/2019 07:54, Alexander Korotkov wrote:
> Andrey Borodin noticed me that results of some KNN-GIST tests are
> obviously wrong and don't match results of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
> f1
> -------------------
> (10,10)
> (NaN,NaN)
> (0,0)
> (1e-300,-1e-300)
> (-3,4)
> (-10,0)
> (-5,-12)
> (5.1,34.5)
>
> (1e+300,Infinity)
> (10 rows)

So we've memorized incorrect results in the expected output. Ouch!

> It appears to be related to implementation of comparison function in
> pairing heap used as priority queue for KNN. It used plain float
> comparison, which doesn't handle Inf and Nan values well. Attached
> patch replaced it with float8_cmp_internal().
>
> Also, note that with patch KNN results still don't fully match results
> of sort node.
>
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
> f1
> -------------------
> (0,0)
> (1e-300,-1e-300)
> (-3,4)
> (-10,0)
> (10,10)
> (-5,-12)
> (5.1,34.5)
> (1e+300,Infinity)
>
> (NaN,NaN)
> (10 rows)
>
> NULL and '(NaN,NaN)' are swapped. It happens so, because we assume
> distance to NULL to be Inf, while float8_cmp_internal() assumes NaN to
> be greater than NULL. If even we would assume distance to NULL to be
> Nan, it doesn't guarantee that NULLs would be last. It looks like we
> can handle this only by introduction array of "distance is null" flags
> to GISTSearchItem. But does it worth it?

I don't think we have much choice, returning wrong results is not an
option. At first I thought we could set the "recheck" flag if we see
NULLs or NaNs, but that won't work because rechecking is not supported
with Index-Only Scans.

Looking at the corresponding SP-GiST code,
pairingheap_SpGistSearchItem_cmp() gets this right. I'd suggest
copy-pasting the implementation from there, so that they would be as
similar as possible. (SP-GiST gets away with just one isNull flag,
because it doesn't support multi-column indexes. In GiST, you need an
array, as you said.)

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-09-02 06:32:22 pg_dump --exclude-* options documentation
Previous Message Andrey Borodin 2019-09-02 05:11:42 Re: Bug in GiST paring heap comparator