Re: Incorrect usage of strtol, atoi for non-numeric junk inputs

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: alvherre(at)alvh(dot)no-ip(dot)org, bharath(dot)rupireddyforpostgres(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Incorrect usage of strtol, atoi for non-numeric junk inputs
Date: 2021-07-21 08:02:29
Message-ID: 20210721.170229.1864410878946243716.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 15 Jul 2021 16:19:07 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Wed, Jul 14, 2021 at 11:02:47AM -0400, Alvaro Herrera wrote:
> > No, this doesn't work. When the first word is something that is
> > not to be translated (a literal parameter name), let's use a %s (but of
> > course the values must be taken out of the phrase too). But when it is
> > a translatable phrase, it must be present a full phrase, no
> > substitution:
> >
> > pg_log_error("%s must be in range %s..%s", "extra_float_digits", "-15", "3");
> > pg_log..("compression level must be in range %s..%s", "0", "9"))
> >
> > I think the purity test is whether you want to use a _() around the
> > argument, then you have to include it into the original message.
>
> After thinking about that, it seems to me that we don't have that much
> context to lose once we build those error messages to show the option
> name of the command. And the patch, as proposed, finishes with the

Agreed.

> same error message patterns all over the place, which would be a
> recipe for more inconsistencies in the future. I think that we should
> centralize all that, say with a small-ish routine in a new file called
> src/fe_utils/option_parsing.c that uses strtoint() as follows:
> bool option_parse_int(const char *optarg,
> const char *optname,
> int min_range,
> int max_range,
> int *result);
>
> Then this routine may print two types of errors through
> pg_log_error():
> - Incorrect range, using min_range/max_range.
> - Junk input.
> The boolean status is here to let the caller do any custom exit()
> actions he wishes if there one of those two failures. pg_dump has
> some of that with exit_nicely(), for one.

It is substantially the same suggestion with [1] in the original
thread. The original proposal in the old thread was

> bool pg_strtoint64_range(arg, &result, min, max, &error_message)

The difference is your suggestion is making the function output the
message within. I guess that the reason for the original proposal is
different style of message is required in several places.

Actually there are several irregulars.

1. Some "bare" options (that is not preceded by a hyphen option) like
PID of pg_ctl kill doesn't fit the format. \pset parameters of
pg_ctl is the same.

2. pg_ctl, pg_upgrade use their own error reporting mechanisms.

3. parameters that take real numbers doesn't fit the scheme specifying
range borders. For example boundary values may or may not be included
in the range.

4. Most of the errors are PG_LOG_ERROR, but a few ones are
PG_LOG_FATAL.

That being said, most of the caller sites are satisfied by
fixed-formed messages.

So I changed the interface as the following to deal with the above issues:

+extern optparse_result option_parse_int(int loglevel,
+ const char *optarg, const char *optname,
+ int min_range, int max_range,
+ int *result);

loglevel specifies the loglevel to use to emit error messages. If it
is the newly added item PG_LOG_OMIT, the function never emits an error
message. Addition to that, the return type is changed to an enum which
indicates what trouble the given string has. The caller can print
arbitrary error messages consulting the value. (killproc in pg_ctl.c)

Other points:

I added two more similar functions option_parse_long/double. The
former is a clone of _int. The latter doesn't perform explicit range
checks for the reason described above.

Maybe we need to make pg_upgraded use the common-logging features
instead of its own, but it is not included in this patch.

pgbench's -L option accepts out-of-range values for internal
variable. As the added comment says, we can limit the value with the
large exact number but I limited it to 3600s since I can't imagine
people needs larger latency limit than that.

Similarly to the above, -R option can take for example 1E-300, which
leads to int64 overflow later. Similar to -L, I don't think people
don't need less than 1E-5 or so as this parameter.

The attached patch needs more polish but should be enough to tell what
I have in my mind.

regards.

1: https://www.postgresql.org/message-id/CAKJS1f94kkuB_53LcEf0HF%2BuxMiTCk5FtLx9gSsXcCByUKMz1g%40mail.gmail.com

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v3-0001-Be-strict-in-numeric-parameters-on-command-line.patch text/x-patch 37.0 KB
v3-0002-Make-complain-for-invalid-numeirc-values-in-envir.patch text/x-patch 3.6 KB
v3-0003-Doc-change-for-pg_ctl.patch text/x-patch 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-07-21 08:03:53 Re: 回复: Why is XLOG_FPI_FOR_HINT always need backups?
Previous Message bucoo@sohu.com 2021-07-21 07:36:14 Re: Re: parallel distinct union and aggregate support patch