Re: refactoring - share str2*int64 functions

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: refactoring - share str2*int64 functions
Date: 2019-09-01 05:11:30
Message-ID: 20190901051130.GA27097@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 30, 2019 at 04:50:21PM +0200, Fabien COELHO wrote:
> Given the overheads of the SQL interpreter, I'm unsure about what you would
> measure at the SQL level. You could just write a small standalone C program
> to test perf and accuracy. Maybe this is what you have in mind.

After a certain threshold, you can see the difference anyway by paying
once the overhead of the function. See for example the updated module
attached that I used for my tests.

I have been testing the various implementations, and doing 2B
iterations leads to roughly the following with a non-assert, -O2
build using mul_u32:
- __builtin_sub_overflow => 5s
- cast to uint64 => 5.9s
- division => 8s
You are right as well that having symmetry with the signed methods is
much better. In order to see the difference, you can just do that
with the extension attached, after of course hijacking int.h with some
undefs and recompiling the backend and the module:
select pg_overflow_check(10000, 10000, 2000000000, 'uint32', 'mul');

>> still it is possible to trick things with signed integer arguments.
>
> Is it?

If you pass -1 and then you can fall back to the maximum of each 16,
32 or 64 bits for the unsigned (see the regression tests I added with
the module).

> Do you mean:
>
> sql> SELECT -32768::INT2;
> ERROR: smallint out of range

You are incorrect here, as the minus sign is ignored by the cast.
This works though:
=# SELECT (-32768)::INT2;
int2
--------
-32768
(1 row)

If you look at int2.sql, we do that:
-- largest and smallest values
INSERT INTO INT2_TBL(f1) VALUES ('32767');
INSERT INTO INT2_TBL(f1) VALUES ('-32767');
That's the part I mean is wrong, as the minimum is actually -32768,
but the test fails to consider that. I'll go fix that after
double-checking other similar tests for int4 and int8.

Attached is an updated patch to complete the work for common/int.h,
with the following changes:
- Changed the multiplication methods for uint16 and uint32 to not be
division-based. uint64 can use that only if int128 exists.
- Added comments on top of each sub-sections for the types checked.

Attached is also an updated version of the module I used to validate
this stuff. Fabien, any thoughts?
--
Michael

Attachment Content-Type Size
overflow.tar.gz application/gzip 3.4 KB
str2int-overflows-9.patch text/x-diff 7.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-09-01 05:15:10 Commit fest 2019-09
Previous Message Thomas Munro 2019-09-01 05:04:27 Re: Should we add xid_current() or a int8->xid cast?