Re: Infinite Interval

From: Joseph Koshakow <koshy44(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Infinite Interval
Date: 2023-03-18 18:48:33
Message-ID: CAAvxfHdNDLJZJoPcAo_52sLxEMhqhm8-Bod31qF36puUtJZVnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 9, 2023 at 12:42 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
wrote:
>
>
> static pg_tz *FetchDynamicTimeZone(TimeZoneAbbrevTable *tbl, const
datetkn *tp,
> - DateTimeErrorExtra *extra);
> + DateTimeErrorExtra * extra);
>
> There are a lot of these diffs. PG code doesn't leave an extra space
between
> variable name and *.

Those appeared from running pg_indent. I've removed them all.

> /* Handle the integer part */
> - if (!int64_multiply_add(val, scale, &itm_in->tm_usec))
> + if (pg_mul_add_s64_overflow(val, scale, &itm_in->tm_usec))
>
> I think this is a good change, since we are moving the function to
int.h where
> it belongs. We could separate these kind of changes into another patch
for easy
> review.

I've separated this out into another patch attached to this email.
Should I start a new email thread or is it ok to include it in this
one?

> +
> + result->day = days;
> + if (pg_mul_add_s32_overflow(weeks, 7, &result->day))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
>
> I think such changes are also good, but probably a separate patch for
ease of
> review.

I've separated a patch for this too, which I've also included in this
email.

> secs = rint(secs * USECS_PER_SEC);
> - result->time = hours mj* ((int64) SECS_PER_HOUR * USECS_PER_SEC) +
> - mins * ((int64) SECS_PER_MINUTE * USECS_PER_SEC) +
> - (int64) secs;
> +
> + result->time = secs;
> + if (pg_mul_add_s64_overflow(mins, ((int64) SECS_PER_MINUTE *
> USECS_PER_SEC), &result->time))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
> + if (pg_mul_add_s64_overflow(hours, ((int64) SECS_PER_HOUR *
> USECS_PER_SEC), &result->time))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
>
> shouldn't this be
> secs = rint(secs);
> result->time = 0;
> pg_mul_add_s64_overflow(secs, USECS_PER_SEC, &result->time) to
catch
> overflow error early?

The problem is that `secs = rint(secs)` rounds the seconds too early
and loses any fractional seconds. Do we have an overflow detecting
multiplication function for floats?

> + if TIMESTAMP_IS_NOBEGIN
> + (dt2)
>
> Better be written as if (TIMESTAMP_IS_NOBEGIN(dt2))? There are more
corrections
> like this.

I think this may have also been done by pg_indent, I've reverted all
the examples of this.

> + if (INTERVAL_NOT_FINITE(result))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
>
> Probably, I added these kind of checks. But I don't remember if those
are
> defensive checks or whether it's really possible that the arithmetic
above
> these lines can yield an non-finite interval.

These checks appear in `make_interval`, `justify_X`,
`interval_um_internal`, `interval_pl`, `interval_mi`, `interval_mul`,
`interval_div`. For all of these it's possible that the interval
overflows/underflows the non-finite ranges, but does not
overflow/underflow the data type. For example
`SELECT INTERVAL '2147483646 months' + INTERVAL '1 month'` would error
on this check.

> + else
> + {
> + result->time = -interval->time;
> + result->day = -interval->day;
> + result->month = -interval->month;
> +
> + if (INTERVAL_NOT_FINITE(result))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> + errmsg("interval out of range")));
>
> If this error ever gets to the user, it could be confusing. Can we
elaborate by
> adding context e.g. errcontext("while negating an interval") or some
such?

Done.

> -
> - result->time = -interval->time;
> - /* overflow check copied from int4um */
> - if (interval->time != 0 && SAMESIGN(result->time, interval->time))
> - ereport(ERROR,
> - (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> - errmsg("interval out of range")));
> - result->day = -interval->day;
> - if (interval->day != 0 && SAMESIGN(result->day, interval->day))
> - ereport(ERROR,
> - (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> - errmsg("interval out of range")));
> - result->month = -interval->month;
> - if (interval->month != 0 && SAMESIGN(result->month,
interval->month))
> - ereport(ERROR,
> - (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
> - errmsg("interval out of range")));
> + interval_um_internal(interval, result);
>
> Shouldn't we incorporate these checks in interval_um_internal()? I
don't think
> INTERVAL_NOT_FINITE() covers all of those.

I replaced these checks with the following:

+ else if (interval->time == PG_INT64_MIN || interval->day == PG_INT32_MIN
|| interval->month == PG_INT32_MIN)
+ ereport(ERROR,
+ (errcode(ERRCODE_DATETIME_VALUE_OUT_OF_RANGE),
+ errmsg("interval out of range")));

I think this covers the same overflow check but is maybe a bit more
obvious. Unless, there's something I'm missing?

> + /*
> + * Subtracting two infinite intervals with different signs
results in an
> + * infinite interval with the same sign as the left operand.
Subtracting
> + * two infinte intervals with the same sign results in an error.
> + */
>
> I think we need someone to validate these assumptions and similar
assumptions
> in interval_pl(). Googling gives confusing results in some cases. I
have not
> looked for IEEE standard around this specificallly.

I used to Python and Rust to check my assumptions on the IEEE standard:

Python:
>>> float('inf') + float('inf')
inf
>>> float('-inf') + float('inf')
nan
>>> float('inf') + float('-inf')
nan
>>> float('-inf') + float('-inf')
-inf

>>> float('inf') - float('inf')
nan
>>> float('-inf') - float('inf')
-inf
>>> float('inf') - float('-inf')
inf
>>> float('-inf') - float('-inf')
nan

Rust:
inf + inf = inf
-inf + inf = NaN
inf + -inf = NaN
-inf + -inf = -inf

inf - inf = NaN
-inf - inf = -inf
inf - -inf = inf
-inf - -inf = NaN

I'll try and look up the actual standard and see what it says.

> + if (INTERVAL_NOT_FINITE(interval))
> + {
> + double r = NonFiniteIntervalPart(type, val, lowunits,
> +
INTERVAL_IS_NOBEGIN(interval),
> + false);
> +
> + if (r)
>
> I see that this code is very similar to the corresponding code in
timestamp and
> timestamptz, so it's bound to be correct. But I always thought float
equality
> is unreliable. if (r) is equivalent to if (r == 0.0) so it will not
work as
> intended. But may be (float) 0.0 is a special value for which equality
holds
> true.

I'm not familiar with float equality being unreliable, but I'm by no
means a C or float expert. Can you link me to some docs/explanation?

> +static inline bool
> +pg_mul_add_s64_overflow(int64 val, int64 multiplier, int64 *sum)
>
> I think this needs a prologue similar to int64_multiply_add(), that
the patch
> removes. Similarly for pg_mul_add_s32_overflow().

I've added this to the first patch.

Thanks for the review! Sorry for the delayed response.

- Joe Koshakow

Attachment Content-Type Size
0001-Move-integer-helper-function-to-int.h.patch text/x-patch 3.3 KB
0002-Check-for-overflow-in-make_interval.patch text/x-patch 4.2 KB
0003-Add-infinite-interval-values.patch text/x-patch 89.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-03-18 19:08:33 Re: Infinite Interval
Previous Message Justin Pryzby 2023-03-18 18:30:38 Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode