Re: refactoring - share str2*int64 functions

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: refactoring - share str2*int64 functions
Date: 2019-07-17 07:55:39
Message-ID: alpine.DEB.2.21.1907160735480.8986@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Bonjour Michaël,

> FWIW, I was looking forward to putting my hands on this patch and try
> to get it merged so as we can get rid of those duplications. Here are
> some comments.
>
> +#ifdef FRONTEND
> + fprintf(stderr,
> + "invalid input syntax for type %s: \"%s\"\n",
> "bigint", str);
> +#else
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
> + errmsg("invalid input syntax for type %s: \"%s\"",
> + "bigint", str)));
> +#endif
> Have you looked at using the wrapper pg_log_error() here?

I have not.

I have simply merged the two implementations (pgbench & backend) as they
were.

> +extern bool pg_scanint8(const char *str, bool errorOK, int64 *result);
> +extern uint64 pg_strtouint64(const char *str, char **endptr, int base);

> Hmm. With this patch we have strtoint and pg_strtouint64, which makes
> the whole set inconsistent.

I understand that you mean bits vs bytes? Indeed it can bite!

> +
> #endif /* COMMON_STRING_H */
> Noise diff..

Indeed.

> numutils.c also has pg_strtoint16 and pg_strtoint32, so the locations
> become rather inconsistent with inconsistent APIs for the manipulation
> of int2 and int4 fields, and scanint8 is just a derivative of the same
> logic. We have two categories of routines here:

Yep, but the int2/4 functions are not used elsewhere.

> - The wrappers on top of strtol and strtoul & co, which are named
> respectively strtoint and pg_strtouint64 with the patch. The naming
> part is inconsistent, and we only handle uint64 and int32. We don't
> need to worry about int64 and uint32 because they are not used?

Indeed, it seems that they are not needed/used by client code, AFAICT.

> That's fine by me, but at least let's have a consistent naming.

Ok.

> Prefixing the functions with pg_* is a better practice in my opinion
> as we will unlikely run into conflicts this way.

Ok.

> - The str->integer conversion routines, which actually have very
> similar characteristics to the strtol families as they remove trailing
> whitespaces first, check for a sign, etc, except that they work only
> on base 10. And here we get into a state where pg_scanint8 should be
> actually called pg_strtoint64,

I just removed that:-)

ISTM that the issue is that the error handling of these functions is
pretty different.

> with an interface inconsistent with its int32/int16 relatives now only
> in the backend.

We can, but I'm not at ease with changing the error handling approach.

> Could we consider more consolidation here please? Like moving the whole
> set to src/common/?

My initial plan was simply to remove direct code duplications between
front-end and back-end, not to re-engineer the full set of string to int
conversion functions:-)

On the re-engineering front: Given the various points on the thread, ISTM
that there should probably be two behaviors for str to signed/unsigned
int{16,32,64}, and having only one kind of signature for all types would
be definitely better.

One low-level one that does the conversion or return an error.

Another higher-level one which possibly adds an error message (stderr for
front-end, log for back-end).

One choice is whether there are two functions (the higher one calling the
lower one and adding the messages) or just one with a boolean to trigger
the messages. I do not have a strong opinion. Maybe one function would be
simpler. Andres boolean-compatible enum return looks like a good option.

Overall, this leads to something like:

enum { STRTOINT_OK, STRTOINT_OVERFLOW_ERROR, STRTOINT_SYNTAX_ERROR }
pg_strto{,u}int{8?,16,32,64}
(const char * string, const TYPE * result, const bool verbose);

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-17 07:59:02 Re: Add parallelism and glibc dependent only options to reindexdb
Previous Message Thomas Munro 2019-07-17 07:16:00 Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)