Re: pgbench's expression parsing & negative numbers

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgbench's expression parsing & negative numbers
Date: 2018-09-26 19:48:43
Message-ID: 20180926194843.ra7rwvaezlbni4k7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi,

On 2018-08-10 10:24:29 +0200, Fabien COELHO wrote:
> diff --git a/src/bin/pgbench/exprscan.l b/src/bin/pgbench/exprscan.l
> index 5c1bd88128..e8faea3857 100644
> --- a/src/bin/pgbench/exprscan.l
> +++ b/src/bin/pgbench/exprscan.l
> @@ -191,16 +191,26 @@ notnull [Nn][Oo][Tt][Nn][Uu][Ll][Ll]
> yylval->bval = false;
> return BOOLEAN_CONST;
> }
> +"9223372036854775808" {
> + /* yuk: special handling for minint */
> + return MAXINT_PLUS_ONE_CONST;
> + }

Yikes, do we really need this? If we dealt with - in the lexer, we
shouldn't need it, no?

> /*
> * strtoint64 -- convert a string to 64-bit integer
> *
> - * This function is a modified version of scanint8() from
> + * This function is a slightly modified version of scanint8() from
> * src/backend/utils/adt/int8.c.
> + *
> + * The function returns whether the conversion worked, and if so
> + * "*result" is set to the result.
> + *
> + * If not errorOK, an error message is also printed out on errors.
> */
> -int64
> -strtoint64(const char *str)
> +bool
> +strtoint64(const char *str, bool errorOK, int64 *result)
> {
> const char *ptr = str;
> - int64 result = 0;
> - int sign = 1;
> + int64 tmp = 0;
> + bool neg = false;
>
> /*
> * Do our own scan, rather than relying on sscanf which might be broken
> * for long long.
> + *
> + * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
> + * value as a negative number.
> */
>
> /* skip leading spaces */
> @@ -650,46 +660,80 @@ strtoint64(const char *str)
> if (*ptr == '-')
> {
> ptr++;
> -
> - /*
> - * Do an explicit check for INT64_MIN. Ugly though this is, it's
> - * cleaner than trying to get the loop below to handle it portably.
> - */
> - if (strncmp(ptr, "9223372036854775808", 19) == 0)
> - {
> - result = PG_INT64_MIN;
> - ptr += 19;
> - goto gotdigits;
> - }
> - sign = -1;
> + neg = true;
> }
> else if (*ptr == '+')
> ptr++;
>
> /* require at least one digit */
> - if (!isdigit((unsigned char) *ptr))
> - fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
> + if (unlikely(!isdigit((unsigned char) *ptr)))
> + goto invalid_syntax;
>
> /* process digits */
> while (*ptr && isdigit((unsigned char) *ptr))
> {
> - int64 tmp = result * 10 + (*ptr++ - '0');
> + int8 digit = (*ptr++ - '0');
>
> - if ((tmp / 10) != result) /* overflow? */
> - fprintf(stderr, "value \"%s\" is out of range for type bigint\n", str);
> - result = tmp;
> + if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
> + unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
> + goto out_of_range;
> }
>
> -gotdigits:
> -
> /* allow trailing whitespace, but not other trailing chars */
> while (*ptr != '\0' && isspace((unsigned char) *ptr))
> ptr++;
>
> - if (*ptr != '\0')
> - fprintf(stderr, "invalid input syntax for integer: \"%s\"\n", str);
> + if (unlikely(*ptr != '\0'))
> + goto invalid_syntax;
>
> - return ((sign < 0) ? -result : result);
> + if (!neg)
> + {
> + if (unlikely(tmp == PG_INT64_MIN))
> + goto out_of_range;
> + tmp = -tmp;
> + }
> +
> + *result = tmp;
> + return true;
> +
> +out_of_range:
> + if (!errorOK)
> + fprintf(stderr,
> + "value \"%s\" is out of range for type bigint\n", str);
> + return false;
> +
> +invalid_syntax:
> + /* this can't happen here, function called on int-looking strings. */

This comment doesn't strike me as a good idea, it's almost guaranteed to
be outdated at some point.

> + if (!errorOK)
> + fprintf(stderr,
> + "invalid input syntax for type bigint: \"%s\"\n",str);
> + return false;
> +}

Duplicating these routines is pretty ugly. I really wish we had more
infrastructure to avoid this (in particularly a portable way to report
an error and jump out). But probably out of scope for this patches?

> +/* convert string to double, detecting overflows/underflows */
> +bool
> +strtodouble(const char *str, bool errorOK, double *dv)
> +{
> + char *end;
> +
> + *dv = strtod(str, &end);
> +
> + if (unlikely(errno != 0))
> + {
> + if (!errorOK)
> + fprintf(stderr,
> + "value \"%s\" is out of range for type double\n", str);
> + return false;
> + }
> +
> + if (unlikely(end == str || *end != '\0'))
> + {
> + if (!errorOK)
> + fprintf(stderr,
> + "invalid input syntax for type double: \"%s\"\n",str);
> + return false;
> + }
> + return true;
> }

Not sure I see much point in the unlikelys here, contrasting to the ones
in the backend (and the copy for the frontend) it's extremely unlikley
anything like this will ever be close to a bottleneck. In general, I'd
strongly suggest not placing unlikelys in non-critical codepaths -
they're way too often wrongly estimated.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-09-26 19:49:30 Re: [PATCH] Improve geometric types
Previous Message Andres Freund 2018-09-26 19:35:25 Re: pgsql: Remove absolete function TupleDescGetSlot().