Re: refactoring - share str2*int64 functions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: refactoring - share str2*int64 functions
Date: 2019-09-09 10:17:38
Message-ID: 20190909101738.3qadlljsucxvf2kd@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-09-09 14:28:14 +0900, Michael Paquier wrote:
> On Thu, Sep 05, 2019 at 03:52:48PM +0900, Michael Paquier wrote:
> > Right, there was this part. This brings also the point of having one
> > interface for the backend as all the error messages for the backend
> > are actually the same, with the most simple name being that:
> > pg_strtoint(value, size, error_ok).

I *VEHEMENTLY* oppose the introduction of any future pseudo-generic
routines taking the type width as a parameter. They're weakening the
type system and they're unnecessarily inefficient.

I don't really buy that saving a few copies of a strings is worth that
much. But if you really want to do that, the right approach imo would be
to move the error reporting into a separate function. I.e. something

void pg_attribute_noreturn()
pg_strtoint_error(pg_strtoint_status status, const char *input, const char *type)

that you'd call in small wrappers. Something like

static inline int32
pg_strtoint32_check(const char* s)
{
int32 result;
pg_strtoint_status status = pg_strtoint32(s, &result);

if (unlikely(status == PG_STRTOINT_OK))
pg_strtoint_error(status, s, "int32");
return result;
}

> So, it seems to me that if we want to have a consolidation of
> everything, we basically need to have the generic error messages for
> the backend directly into common/string.c where the routines are
> refactored, and I think that we should do the following based on the
> patch attached:

> - Just remove pg_strtoint(), and move the error generation logic in
> common/string.c.

I'm not quite sure what you mean by moving the "error generation logic"?

> - Add an error_ok flag for all the pg_strto[u]int{16|32|64} routines,
> which generates errors only for FRONTEND is not defined.

I think this is a bad idea.

> - Use pg_log_error() for the frontend errors.
>
> If those errors are added everywhere, we would basically have no code
> paths in the frontend or the backend (for the unsigned part only)
> which use them yet. Another possibility would be to ignore the
> error_ok flag in those cases, but that's inconsistent.

Yea, ignoring it would be terrible idea.

> diff --git a/src/backend/libpq/pqmq.c b/src/backend/libpq/pqmq.c
> index a9bd47d937..593a5ef54e 100644
> --- a/src/backend/libpq/pqmq.c
> +++ b/src/backend/libpq/pqmq.c
> @@ -286,10 +286,10 @@ pq_parse_errornotice(StringInfo msg, ErrorData *edata)
> edata->hint = pstrdup(value);
> break;
> case PG_DIAG_STATEMENT_POSITION:
> - edata->cursorpos = pg_strtoint32(value);
> + edata->cursorpos = pg_strtoint(value, 4);
> break;

I'd be really upset if this type of change went in.

> #include "fmgr.h"
> #include "miscadmin.h"
> +#include "common/string.h"
> #include "nodes/extensible.h"
> #include "nodes/parsenodes.h"
> #include "nodes/plannodes.h"
> @@ -80,7 +81,7 @@
> #define READ_UINT64_FIELD(fldname) \
> token = pg_strtok(&length); /* skip :fldname */ \
> token = pg_strtok(&length); /* get field value */ \
> - local_node->fldname = pg_strtouint64(token, NULL, 10)
> + (void) pg_strtouint64(token, &local_node->fldname)

Seems like these actually could just ought to use the error-checked
variants. And I think it ought to change all of
READ_{INT,UINT,LONG,UINT64,OID}_FIELD, rather than just redirecting one
of them to the new routines.

> static void pcb_error_callback(void *arg);
> @@ -496,7 +496,7 @@ make_const(ParseState *pstate, Value *value, int location)
>
> case T_Float:
> /* could be an oversize integer as well as a float ... */
> - if (scanint8(strVal(value), true, &val64))
> + if (pg_strtoint64(strVal(value), &val64) == PG_STRTOINT_OK)
> {
> /*
> * It might actually fit in int32. Probably only INT_MIN can

Not for this change, but we really ought to move away from this crappy
logic. It's really bonkers to have T_Float represent large integers and
floats.

> +/*
> + * pg_strtoint16
> + *
> + * Convert input string to a signed 16-bit integer. Allows any number of
> + * leading or trailing whitespace characters.
> + *
> + * NB: Accumulate input as a negative number, to deal with two's complement
> + * representation of the most negative number, which can't be represented as a
> + * positive number.
> + *
> + * The function returns immediately if the conversion failed with a status
> + * value to let the caller handle the error. On success, the result is
> + * stored in "*result".
> + */
> +pg_strtoint_status
> +pg_strtoint16(const char *s, int16 *result)
> +{
> + const char *ptr = s;
> + int16 tmp = 0;
> + bool neg = false;
> +
> + /* skip leading spaces */
> + while (likely(*ptr) && isspace((unsigned char) *ptr))
> + ptr++;
> +
> + /* handle sign */
> + if (*ptr == '-')
> + {
> + ptr++;
> + neg = true;
> + }
> + else if (*ptr == '+')
> + ptr++;

> + /* require at least one digit */
> + if (unlikely(!isdigit((unsigned char) *ptr)))
> + return PG_STRTOINT_SYNTAX_ERROR;

Wonder if there's an argument for moving this behaviour to someplace
else - in most cases we really don't expect whitespace, and checking for
it is unnecessary overhead.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2019-09-09 10:20:30 Re: Attempt to consolidate reading of XLOG page
Previous Message fn ln 2019-09-09 09:32:09 Re: BUG #15977: Inconsistent behavior in chained transactions