Re: refactoring - share str2*int64 functions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: refactoring - share str2*int64 functions
Date: 2019-09-04 09:08:39
Message-ID: 20190904090839.stp3madovtynq3px@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-09-03 20:10:37 +0200, Fabien COELHO wrote:
> @@ -113,7 +113,7 @@ parse_output_parameters(List *options, uint32 *protocol_version,
> errmsg("conflicting or redundant options")));
> protocol_version_given = true;
>
> - if (!scanint8(strVal(defel->arg), true, &parsed))
> + if (unlikely(pg_strtoint64(strVal(defel->arg), &parsed) != PG_STRTOINT_OK))
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> errmsg("invalid proto_version")));

Unexcited about adding unlikely() to any place that's not a hot path -
which this certainly is not.

But I also wonder if we shouldn't just put the branch likelihood of
pg_strtoint64 not failing into one central location. E.g. something like

static inline pg_strtoint_status
pg_strtoint64(const char *str, int64 *result)
{
pg_strtoint_status status;

status = pg_strtoint64_impl(str, result);

likely(status == PG_STRTOINT_OK);

return status;
}

I've not tested this, but IIRC that should be sufficient to propagate
that knowledge up to callers.

> + if (likely(stat == PG_STRTOINT_OK))
> + return true;
> + else if (stat == PG_STRTOINT_RANGE_ERROR)
> {
> - /* could fail if input is most negative number */
> - if (unlikely(tmp == PG_INT64_MIN))
> - goto out_of_range;
> - tmp = -tmp;
> - }
> -
> - *result = tmp;
> - return true;
> -
> -out_of_range:
> - if (!errorOK)
> ereport(ERROR,
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> errmsg("value \"%s\" is out of range for type %s",
> str, "bigint")));
> - return false;
> -
> -invalid_syntax:
> - if (!errorOK)
> + return false;
> + }
> + else if (stat == PG_STRTOINT_SYNTAX_ERROR)
> + {
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
> errmsg("invalid input syntax for type %s: \"%s\"",
> "bigint", str)));
> - return false;
> + return false;
> + }
> + else
> + /* cannot get here */
> + Assert(0);

I'd write this as a switch over the enum - that way we'll get a
compile-time warning if we're not handling a value.

> +static bool
> +str2int64(const char *str, int64 *val)
> +{
> + pg_strtoint_status stat = pg_strtoint64(str, val);
> +

I find it weird to have a wrapper that's named 'str2...' that then calls
'strto' to implement itself.

> +/*
> + * pg_strtoint64 -- convert a string to 64-bit integer
> + *
> + * The function returns if the conversion failed, or
> + * "*result" is set to the result.
> + */
> +pg_strtoint_status
> +pg_strtoint64(const char *str, int64 *result)
> +{
> + const char *ptr = str;
> + int64 tmp = 0;
> + bool neg = false;
> +
> + /*
> + * Do our own scan, rather than relying on sscanf which might be broken
> + * for long long.
> + *
> + * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
> + * value as a negative number.
> + */
> +
> + /* skip leading spaces */
> + while (isspace((unsigned char) *ptr))
> + ptr++;
> +
> + /* handle sign */
> + if (*ptr == '-')
> + {
> + ptr++;
> + neg = true;
> + }
> + else if (*ptr == '+')
> + ptr++;
> +
> + /* require at least one digit */
> + if (unlikely(!isdigit((unsigned char) *ptr)))
> + return PG_STRTOINT_SYNTAX_ERROR;
> +
> + /* process digits, we know that we have one ahead */
> + do
> + {
> + int64 digit = (*ptr++ - '0');
> +
> + if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
> + unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
> + return PG_STRTOINT_RANGE_ERROR;
> + }
> + while (isdigit((unsigned char) *ptr));

Hm. If we're concerned about the cost of isdigit etc - and I think
that's reasonable, after looking at their implementation in glibc (it
performs a lookup in a global table to handle potential locale changes)
- I think we ought to just not use the provided isdigit, and probably
not isspace either. This code effectively relies on the input being
ascii anyway, so we don't need any locale specific behaviour afaict
(we'd e.g. get wrong results if isdigit() returned true for something
other than the ascii chars).

To me the generated code looks considerably better if I use something
like

static inline bool
pg_isdigit(char c)
{
return c >= '0' && c <= '9';
}

static inline bool
pg_isspace(char c)
{
return c == ' ';
}

(if we were to actually go for this, we'd probably want to add some docs
that we don't expect EOF, or make the code safe against that).

I've not benchmarked that, but I'd be surprised if it didn't improve
matters.

And once coded using the above, there's no point in the do/while
conversion imo, as any compiler can trivially optimize redundant checks.

> +/*
> + * pg_strtouint64 -- convert a string to unsigned 64-bit integer
> + *
> + * The function returns if the conversion failed, or
> + * "*result" is set to the result.
> + */
> +pg_strtoint_status
> +pg_strtouint64(const char *str, uint64 *result)
> +{
> + const char *ptr = str;
> + uint64 tmp = 0;
> +
> + /* skip leading spaces */
> + while (isspace((unsigned char) *ptr))
> + ptr++;
> +
> + /* handle sign */
> + if (*ptr == '+')
> + ptr++;
> + else if (unlikely(*ptr == '-'))
> + return PG_STRTOINT_SYNTAX_ERROR;

Hm. Seems like that should return PG_STRTOINT_RANGE_ERROR?

> +typedef enum {

Please don't define anonyous types (the enum itself, which now is only
reachable via the typedef). Also, there's a missing newline here).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-09-04 09:09:27 Re: pglz performance
Previous Message Richard Guo 2019-09-04 08:29:35 Re: d25ea01275 and partitionwise join