Re: refactoring - share str2*int64 functions

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: refactoring - share str2*int64 functions
Date: 2019-09-04 10:49:17
Message-ID: alpine.DEB.2.21.1909041027080.8057@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Bonjour Michaël,

>> Attached a rebased version which implements the int64/uint64 stuff. It is
>> basically the previous patch without the overflow inlined functions.
>
> - if (!strtoint64(yytext, true, &yylval->ival))
> + if (unlikely(pg_strtoint64(yytext, &yylval->ival) != PG_STRTOINT_OK))

> It seems to me that we should have unlikely() only within
> pg_strtoint64(), pg_strtouint64(), etc.

From a compiler perspective, the (un)likely tip is potentially useful on
any test. We know when parsing a that it is very unlikely that the string
conversion would fail, so we can tell that, so that the compiler knows
which branch it should optimize first.

You can argue against that if the functions are inlined, because maybe the
compiler would propagate the information, but for distinct functions
compiled separately the information is useful at each level.

> - /* skip leading spaces; cast is consistent with strtoint64 */
> - while (*ptr && isspace((unsigned char) *ptr))

> + /* skip leading spaces; cast is consistent with pg_strtoint64 */
> + while (isspace((unsigned char) *ptr))
> ptr++;

> What do you think about splitting this part in two? I would suggest
> to do the refactoring in a first patch, and the consider all the
> optimizations for the routines you have in mind afterwards.

I would not bother.

> I think that we don't actually need is_an_int() and str2int64() at all
> in pgbench.c as we could just check for the return code of
> pg_strtoint64() and switch to the double parsing only if we don't have
> PG_STRTOINT_OK.

Yep, you are right, now that the conversion functions does not error out a
message, its failure can be used as a test.

The version attached changes slightly the semantics, because on int
overflows a double conversion will be attempted instead of erroring. I do
not think that it is worth the effort of preserving the previous semantic
of erroring.

> scanint8() only has one caller in the backend with your patch, and we
> don't check after its return result in int8.c, so I would suggest to
> move the error handling directly in int8in() and to remove scanint8().

Ok.

> I think that we should also introduce the [u]int[16|32] flavors and
> expand them in the code in a single patch, in a way consistent with
> what you have done for int64/uint64 as the state that we reach with
> the patch is kind of weird as there are existing versions numutils.c.

Before dealing with the 16/32 versions, which involve quite a significant
amount of changes, I would want a clear message that "the 64 bit approach"
is the model to follow.

Moreover, I'm unsure how to rename the existing pg_strtoint32 and others
which call ereport, if the name is used for the common error returning
version.

> Have you done some performance testing of the patch? The COPY
> bulkload is a good example of that:
> https://www.postgresql.org/message-id/20190717181428.krqpmduejbqr4m6g%40alap3.anarazel.de

I have done no such thing for now.

I would not expect any significant performance difference when loading
int8 things because basically scanint8 has just been renamed pg_strtoint64
and made global, and that is more or less all. It might be a little bit
slower because possible the compiler cannot inline the conversion, but on
the other hand, the *likely hints and removed tests may marginaly help
performance. I think that the only way to test performance significantly
would be to write a specific program that loops over a conversion.

--
Fabien.

Attachment Content-Type Size
str2int-11.patch text/x-diff 20.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ibrar Ahmed 2019-09-04 11:05:03 Re: fix for BUG #3720: wrong results at using ltree
Previous Message Rafia Sabih 2019-09-04 09:37:48 Re: [PATCH] Incremental sort (was: PoC: Partial sort)