Tightening DecodeNumberField's parsing rules

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Evgeniy Gorbanev <gorbanev(dot)es(at)gmail(dot)com>
Subject: Tightening DecodeNumberField's parsing rules
Date: 2025-05-27 18:38:19
Message-ID: 1328335.1748371099@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Evgeniy Gorbanev reported to the security list that he'd found
a case where timestamp_in triggered an undefined-behavior
sanitizer warning, due to trying to store a float value larger
than INT_MAX into an integer variable. We concluded that there's
no real security issue there, it's just that the C standard
doesn't say what the result should be. So I'm moving this
discussion to the public list, because it's also not entirely
clear what the fix should be.

Evgeniy's example case was

SELECT 'j 05 t a6.424e5-'::timestamp;

which of course is garbage and draws an "invalid input syntax"
error as-expected, though not till after DecodeNumberField
has tried to stuff ".424e5 * 1000000" into an integer. However,
I found that there are closely related cases that don't draw
any syntax error, such as

regression=# SELECT timestamp with time zone 'J2452271 T X03456-08';
timestamptz
------------------------
2001-12-27 00:34:56-08
(1 row)

regression=# SELECT timestamp with time zone 'J2452271 T X03456.001e6-08';
timestamptz
------------------------
2001-12-27 00:51:36-08
(1 row)

The fundamental problem here, IMO, is that DecodeNumberField assumes
without checking that its input contains only digits and perhaps a
decimal point. In this example though, it's given whatever remains
after stripping the run-on timezone spec, that is "X03456.001e6".
That triggers both the overflow problem with the bogus ".001e6"
fraction, and a totally inappropriate reading of "X0" as "hour zero".

What I think we ought to do about this is reject strings containing
non-digits, as in the attached patch. However, there's no doubt that
that'd make timestamp_in reject some inputs it used to accept, and
we've generally tried to avoid that --- commit f0d0394e8 for instance
is a recent effort in the direction of not breaking backward
compatibility, and it adopted a position that we should not break
cases that have a sensible interpretation, even if they're surprising.
The difference here is that I don't think there's a sensible
interpretation; or if there is, it's certainly not what the code
does today.

So what I propose we do about this is to apply the attached to HEAD
and leave the back branches alone. Right now, inputs like this are
"garbage in, garbage out" but don't cause any real problem (upsetting
a sanitizer check isn't of concern for production use). So I'm not
seeing that there's a good argument for back-patching.

We could alternatively apply some more-limited fix; Evgeniy's
original recommendation was just to reject if the result of
strtod() was outside the expected 0..1 range. But I'm finding
it hard to believe that that's a great idea.

Comments?

regards, tom lane

Attachment Content-Type Size
v1-tighten-DecodeNumberField.patch text/x-diff 3.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shaik Mohammad Mujeeb 2025-05-27 18:41:03 Clarification on warning when connecting to 'pgbouncer' database via Pgbouncer
Previous Message David G. Johnston 2025-05-27 18:30:55 Re: Doc: section "8.21. Pseudo-Types" needs a bit of clarification?