Re: refactoring - share str2*int64 functions

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: refactoring - share str2*int64 functions
Date: 2019-09-09 05:28:14
Message-ID: 20190909052814.GA26605@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 05, 2019 at 03:52:48PM +0900, Michael Paquier wrote:
> Right, there was this part. This brings also the point of having one
> interface for the backend as all the error messages for the backend
> are actually the same, with the most simple name being that:
> pg_strtoint(value, size, error_ok).

I have been looking at that for the last couple of days. First, I
have consolidated all the error strings in a single routine like this
one, except that error_ok is not really needed if you take this
approach: callers that don't care about failures could just call the
set of routines in common/string.c and be done with it.

Attached is an update of my little module that I used to check that
the refactoring was done correctly (regression tests attached), it
also includes a couple of routines to check the performance difference
between one approach and the other, with focus on two things:
- Is pg_strtouint64 really improved?
- How much do we lose by moving to a common interface in the backend
with pg_strtoint?

The good news is that removing strtol from pg_strtouint64 really
improves the performance as already reported, and with one billion
calls in a tight loop you see a clear difference:
=# select pg_strtouint64_old_check('10000', 1000000000);
pg_strtouint64_old_check
--------------------------
10000
(1 row)
Time: 15576.539 ms (00:15.577)
=# select pg_strtouint64_new_check('10000', 1000000000);
pg_strtouint64_new_check
--------------------------
10000
(1 row)
Time: 8820.544 ms (00:08.821)

So the new implementation is more than 40% faster with this
micro-benchmark on my Debian box.

The bad news is that a pg_strtoint() wrapper is not a good idea:
=# select pg_strtoint_check('10000', 1000000000, 4);
pg_strtoint_check
-------------------
10000
(1 row)
Time: 11178.101 ms (00:11.178)
=# select pg_strtoint32_check('10000', 1000000000);
pg_strtoint32_check
---------------------
10000
(1 row)
Time: 9252.894 ms (00:09.253)
So trying to consolidate all error messages leads to a 15% hit with
this test, which sucks.

So, it seems to me that if we want to have a consolidation of
everything, we basically need to have the generic error messages for
the backend directly into common/string.c where the routines are
refactored, and I think that we should do the following based on the
patch attached:
- Just remove pg_strtoint(), and move the error generation logic in
common/string.c.
- Add an error_ok flag for all the pg_strto[u]int{16|32|64} routines,
which generates errors only for FRONTEND is not defined.
- Use pg_log_error() for the frontend errors.

If those errors are added everywhere, we would basically have no code
paths in the frontend or the backend (for the unsigned part only)
which use them yet. Another possibility would be to ignore the
error_ok flag in those cases, but that's inconsistent. My take would
be here to have a more generic error message, like that:
"value \"%s\" is out of range for [unsigned] {16|32|64}-bit integer"
"invalid input syntax for [unsigned] {16|32|64}-bit integer: \"%s\"\n"

I do not suggest to change the messages for the backend for signed
entries though, as these are linked to the data types used.

Attached is an update of my toy module, and an updated patch with what
I have done up to now. This stuff already does a lot, so for now I
have not worked on the removal strtoint() in string.c yet. We could
just do that with a follow-up patch and make it use the new conversion
routines once we are sure that they are in a correct shape. As
strtoint() makes use of strtol(), switching to the new routines will
be much faster anyway...

Fabien, any thoughts?
--
Michael

Attachment Content-Type Size
str2int-12.patch text/x-diff 36.9 KB
overflow.tar.gz application/gzip 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jamison, Kirk 2019-09-09 06:52:03 RE: [PATCH] Speedup truncates of relation forks
Previous Message fn ln 2019-09-09 03:58:46 Re: BUG #15977: Inconsistent behavior in chained transactions