Re: Wrong results from in_range() tests with infinite offset

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Wrong results from in_range() tests with infinite offset
Date: 2020-07-18 08:16:39
Message-ID: CAEZATCVQO1vUXo0PjHToxvyZ8piwH-u-mTMqMhrGku2dYMhWbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 17 Jul 2020 at 01:59, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> > On Thu, 16 Jul 2020, 22:50 Tom Lane, <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Actually, after staring at those results awhile longer, I decided
> >> they were wrong. The results shown here seem actually sane ---
> >> for instance, -Infinity shouldn't "infinitely precede" itself,
> >> I think. (Maybe if you got solipsistic enough you could argue
> >> that that is valid, but it seems pretty bogus.)
>
> > Hmm, that code looks a bit fishy to me, but I really need to think about it
> > some more. I'll take another look tomorrow, and maybe it'll become clearer.
>
> It's certainly verbose, so I'd like to find a more concise way to
> write the logic. But the v2 results seem right.
>

I'm finding it hard to come up with a principled argument to say
exactly what the right results should be.

As things stand (pre-patch), a window frame defined as "BETWEEN 'inf'
PRECEDING AND 'inf' PRECEDING", produces the following:

id | f_float4 | first_value | last_value
----+-----------+-------------+------------
0 | -Infinity | |
1 | -3 | |
2 | -1 | |
3 | 0 | |
4 | 1.1 | |
5 | 1.12 | |
6 | 2 | |
7 | 100 | |
8 | Infinity | |
9 | NaN | 9 | 9
(10 rows)

which is clearly wrong, because -Inf obviously infinitely precedes all
the other (non-NaN) values.

With the first version of the patch, that result became

id | f_float4 | first_value | last_value
----+-----------+-------------+------------
0 | -Infinity | 0 | 0
1 | -3 | 0 | 0
2 | -1 | 0 | 0
3 | 0 | 0 | 0
4 | 1.1 | 0 | 0
5 | 1.12 | 0 | 0
6 | 2 | 0 | 0
7 | 100 | 0 | 0
8 | Infinity | 0 | 0
9 | NaN | 9 | 9
(10 rows)

which is definitely better, but the one obvious problem is last_value
for id=8, because all the values in earlier rows infinitely precede
+Inf, so they should be included in the window frame for that row.

With the second version of the patch, the result is

id | f_float4 | first_value | last_value
----+-----------+-------------+------------
0 | -Infinity | |
1 | -3 | 0 | 0
2 | -1 | 0 | 0
3 | 0 | 0 | 0
4 | 1.1 | 0 | 0
5 | 1.12 | 0 | 0
6 | 2 | 0 | 0
7 | 100 | 0 | 0
8 | Infinity | 0 | 7
9 | NaN | 9 | 9
(10 rows)

That fixes last_value for id=8, using the fact that all values less
than +Inf infinitely precede it, and also assuming that +Inf does not
infinitely precede itself, which seems reasonable.

The other change is in the first row, because it now assumes that -Inf
doesn't infinitely precede itself, which seems reasonable from a
consistency point of view.

However, that is also a bit odd because it goes against the documented
contract of in_range(), which is supposed to do the tests

val <= base +/- offset1
val >= base +/- offset2

which for "BETWEEN 'inf' PRECEDING AND 'inf' PRECEDING" become

val = base - Inf

which is -Inf, even if base = -Inf. So I'd say that the window
infinitely preceding -Inf contains -Inf, since -Inf - Inf = -Inf.

But if -Inf infinitely precedes -Inf, it probably also makes sense to
say that +Inf infinitely precedes +Inf for consistency, even though
that really isn't well-defined, since Inf - Inf = NaN. Doing that is
certainly a lot easier to code, because it just needs to return true
if base +/- offset would be NaN, i.e.,

/*
* Deal with cases where both base and offset are infinite, and computing
* base +/- offset would produce NaN. This corresponds to a window frame
* whose boundary infinitely precedes +inf or infinitely follows -inf,
* which is not well-defined. For consistency with other cases involving
* infinities, such as the fact that +inf infinitely follows +inf, we
* choose to assume that +inf infinitely precedes +inf and -inf infinitely
* follows -inf, and therefore that all finite and infinite values are in
* such a window frame.
*/
if (isinf(base) && isinf(offset))
{
if ((base > 0 && sub) || (base < 0 && !sub))
PG_RETURN_BOOL(true);
}

and the result is

id | f_float8 | first_value | last_value
----+-----------+-------------+------------
0 | -Infinity | 0 | 0
1 | -3 | 0 | 0
2 | -1 | 0 | 0
3 | 0 | 0 | 0
4 | 1.1 | 0 | 0
5 | 1.12 | 0 | 0
6 | 2 | 0 | 0
7 | 100 | 0 | 0
8 | Infinity | 0 | 8
9 | NaN | 9 | 9
(10 rows)

which looks about equally sensible. To me, the fact that the window
infinitely preceding -Inf includes -Inf makes more sense, but the
meaning of the window infinitely preceding +Inf is much less obvious,
and not really well-defined.

Regards,
Dean

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-07-18 09:15:28 Re: Patch for reorderbuffer.c documentation.
Previous Message John Naylor 2020-07-18 08:00:12 Re: factorial function/phase out postfix operators?