Re: Avoid overflow (src/backend/utils/adt/formatting.c)

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Bryan Green <dbryan(dot)green(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Avoid overflow (src/backend/utils/adt/formatting.c)
Date: 2025-11-03 00:34:11
Message-ID: CAEudQAoEQRDL-oD5nU6xgVv=x_t1bzctSu3iaSuphFd5hKh67g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em dom., 2 de nov. de 2025 às 13:22, Bryan Green <dbryan(dot)green(at)gmail(dot)com>
escreveu:

> On 11/2/2025 10:09 AM, Bryan Green wrote:
> > On 11/2/2025 8:59 AM, Ranier Vilela wrote:
> >> Hi.
> >>
> >> Per Coverity.
> >>
> >> Coverity raised the follow report:
> >>
> >> CID 1642824: (#1 of 1): Overflowed constant (INTEGER_OVERFLOW)
> >> 37. overflow_const: Expression pattern_len - 1ULL, where pattern_len is
> >> known to be equal to 0, underflows the type of pattern_len - 1ULL,
> which
> >> is type unsigned long long.
> >>
> >> This is because the function *pg_mbstrlen* can return zero.
> >> And the comment is clear, *just in case there are MB chars*.
> >>
> >> patch attached.
> >>
> >> best regards,
> >> Ranier Vilela
> >
> > Thanks for finding and patching this. I think you're absolutely
> > right that there's a latent bug here, and your fix is appropriate.
> > However, I wanted to share some thoughts on why this has probably
> > never been reported in the wild, and why we should apply the patch
> > anyway.
> >
> > The crash requires a specific and rather unlikely combination of
> > circumstances:
> >
> > 1. You need a locale where lconv->thousands_sep is present but
> > empty. Most locales either provide a real separator (",", ".",
> > or " ") or return NULL (in which case NUM_prepare_locale's
> > fallback logic provides a default). The comments mention
> > "broken glibc pt_BR" as an example, but even that's been fixed
> > in most glibc versions for years now.
> >
> > 2. You need a format string containing 'G', which requests
> > insertion of the locale's thousands separator.
> >
> > Here's the thing: if your locale has no thousands separator, why
> > would you write a format string asking for one? Consider:
> >
> > -- In a locale with thousands_sep = ","
> > SELECT to_char(1234567.89, '9G999G999.99');
> > -- Result: "1,234,567.89"
> >
> > -- In a locale with empty thousands_sep
> > SELECT to_char(1234567.89, '9G999G999.99');
> > -- Result: "1234567.89" (the G's insert nothing)
> >
> > If you're working in a locale that doesn't have a thousands
> > separator, you're most likely culturally aware of that fact,
> > and you simply wouldn't include 'G' in your format strings.
> > It would be like writing code to format something your locale
> > doesn't have a concept of.
> >
> > So I think the bug has survived this long because:
> > a) Very few locales have truly empty thousands_sep strings
> > b) Users of those locales naturally avoid the 'G' format code
> > c) Even accidental use would require the right fill-mode
> > settings
> >
> > That said, in my opinion we should definitely apply your patch,
> > for several reasons:
> >
> > First, it's undefined behavior. Backing up the pointer via
> > "ptr += -1" when ptr is at the start of allocated memory is a
> > serious bug, regardless of how unlikely the trigger condition is.
> > We don't want that lurking in the codebase.
> >
> > Second, even if this hasn't been triggered in 20+ years, it's
> > exactly the kind of thing that AFL or similar tools might find.
> > Better to fix it before we get a CVE filing.
> >
> > Third, it's a trivial fix. The performance impact is nil, and it
> > makes the code more obviously correct. There's really no
> > downside.
> >
> > I'd suggest one minor adjustment to your patch: add a comment
> > explaining why we're checking for empty pattern, since it's
> > non-obvious:
> >
> > /*
> > * Guard against empty separator (could happen with some
> > * locales)
> > */
> > if (pattern_len > 0)
> > {
> > ...
> > }
> >
> > That'll help future maintainers understand this isn't just
> > defensive programming paranoia.
> >
> > Good catch on finding this. We'll have to see if others agree...
> >
> > Bryan Green
> I did a bit more analysis to double check myself and realized I missed
> an important check:
>
> NUM_prepare_locale() has a safety check:
>
> if (lconv->thousands_sep && *lconv->thousands_sep)
> Np->L_thousands_sep = lconv->thousands_sep;
> else if (strcmp(Np->decimal, ",") != 0)
> Np->L_thousands_sep = ",";
> else
> Np->L_thousands_sep = ".";
>
> The "*lconv->thousands_sep" test specifically prevents empty
> strings from being used, so Np->L_thousands_sep can never be empty in
> production.
>
The question is that the *pattern* is not empty.
But that can not contain MB characters.
In this case, not contain any MB chars, pg_mbstrlen
will return zero and overflow can happen.

best regards,
Ranier Vilela

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Philip Alger 2025-11-03 01:18:44 Re: [PATCH] Add pg_get_trigger_ddl() to retrieve the CREATE TRIGGER statement
Previous Message Ranier Vilela 2025-11-03 00:31:25 Re: Avoid overflow (src/backend/utils/adt/formatting.c)