Re: Fix overflow in DecodeInterval

From: Joseph Koshakow <koshy44(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix overflow in DecodeInterval
Date: 2022-03-06 16:14:25
Message-ID: CAAvxfHd+8GAZSPxcuJa+Z3TG2WoPUmJ-ER9NxjAhKwQaDzgdKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,

Sorry for the delay in the new patch, I've attached my most recent
patch to this email. I ended up reworking a good portion of my previous
patch so below I've included some reasons why, notes on my current
approach, and some pro/cons to the approach.

* The main reason for the rework had to do with double conversions and
shared code.

* The existing code for rounding had a lot of int to double
casting and vice versa. I *think* that doubles are able to completely
represent the range of ints. However doubles are not able to represent
the full range of int64. After making the change I started noticing
a lot of lossy behavior. One thought I had was to change the doubles
to long doubles, but I wasn't able to figure out if long doubles could
completely represent the range of int64. Especially since their size
varies depending on the architecture. Does anyone know the answer to
this?

* I ended up creating two intermediate data structures for Intervals.
One for decoding and one for everything else. I'll go into more detail
below.
* One common benefit was that they both contain a usec field which
means that the Interval methods no longer need to carry around a
separate fsec argument.
* The obvious con here is that Intervals require two unique
intermediate data structures, while all other date/time types
can share a single intermediate data structure. I find this to
be a bit clunky.

* pg_itm_in is the struct used for Interval decoding. It's very similar
to pg_tm, except all of the time related fields are collapsed into a
single `int64 usec` field.
* The biggest benefit of this was that all int64-double conversions
are limited to a single function, AdjustFractMicroseconds. Instead
of fractional units flowing down over every single time field, they
only need to flow down into the single `int64 usec` field.
* Overflows are caught much earlier in the decoding process which
helps avoid wasted work.
* I found that the decoding code became simpler for time fields,
though this is a bit subjective.

* pg_itm is the struct used for all other Interval functionality. It's
very similar to pg_tm, except the tm_hour field is converted from int
to int64 and an `int tm_usec` field was added.
* When encoding and working with Intervals, we almost always want
to break the time field out into hours, min, sec, usec. So it's
helpful to have a common place to do this, instead of every
function duplicating this code.
* When breaking the time fields out, a single field will never
contain a value greater than could have fit in the next unit
higher. Meaning that minutes will never be greater than 60, seconds
will be never greater than 60, and usec will never be greater than
1,000. So hours is actually the only field that needs to be int64
and the rest can be an int.
* This also helps limit the impact to shared code (see below).

* There's some shared code between Intervals and other date/time types.
Specifically the DecodeTime function and the datetime_to_char_body
function. These functions take in a `struct pg_tm` and a `fsec_t fsec`
(fsec_t is just an alias for int32) which allows them to be re-used by
all date/time types. The only difference now between pg_tm and pg_itm
is the tm_hour field size (the tm_usec field in pg_itm can be used as
the fsec). So to get around this I changed the function signatures to
take a `struct pg_tm`, `fsec_t fsec`, and an `int64 hour` argument.
It's up to the caller to provide to correct hour field. Intervals can
easily convert pg_itm to a pg_tm, fsec, and hour. It's honestly a bit
error-prone since those functions have to explicitly ignore the
pg_tm->tm_hour field and use the provided hour argument instead, but I
couldn't think of a better less intrusive solution. If anyone has a
better idea, please don't hesitate to bring it up.

* This partly existed in the previous patch, but I just wanted to
restate it. All modifications to pg_itm_in during decoding is done via
helper functions that check for overflow. All invocations of these
functions either return an error on overflow or explicitly state why an
overflow is impossible.

* I completely rewrote the ParseISO8601Number function to try and avoid
double to int64 conversions. I tried to model it after the parsing done
in DecodeInterval, though I would appreciate extra scrutiny here.

- Joe Koshakow

Attachment Content-Type Size
v8-0001-Check-for-overflow-when-decoding-an-interval.patch text/x-patch 102.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-03-06 16:16:54 Re: Adding CI to our tree (ccache)
Previous Message Robert Haas 2022-03-06 15:27:40 Re: role self-revocation