Re: Efficient output for integer types

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: david(at)fetter(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Efficient output for integer types
Date: 2019-09-18 07:27:46
Message-ID: 20190918.162746.139473050.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter <david(at)fetter(dot)org> wrote in <20190918034201(dot)GX31596(at)fetter(dot)org>
> On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > Folks,
> > > >
> > > > Please find attached a couple of patches intended to $subject.
> > > >
> > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > pretty decent.
> > >
> > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > help in any cases I was testing.
> >
> > Found a couple of "whiles" that should have been "ifs."
>
> Factored out some inefficient functions and made the guts use the more
> efficient function.

I'm not sure this is on the KISS principle, but looked it and
have several random comments.

+numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)

I don't think that we are allowing that as project coding
policy. It seems to have been introduced only to accept external
code as-is.

- char str[23]; /* sign, 21 digits and '\0' */
+ char str[MAXINT8LEN];

It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN
can be so. I think MAXINT8LEN should be 20 and the definition
should be str[MAXINT8LEN + 1].

+static const char DIGIT_TABLE[200] = {
+ '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',

Wouldn't it be simpler if it were defined as a constant string?

static const char DIGIT_TABLE[201] =
"000102030405....19"
"202122232425....39"
..

+pg_ltoa_n(int32 value, char *a)
...
+ /* Compute the result string. */
+ while (value >= 100000000)

We have only two degits above the value. Isn't the stuff inside
the while a waste of cycles?

+ /* Expensive 64-bit division. Optimize? */

I believe compiler treats such trivial optimizations. (concretely
converts into shifts and subtractons if faster.)

+ memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2);

Maybe it'd be easy to read if 'a + olength - i' is a single variable.

+ i += adjust;
+ return i;

If 'a + olength - i' is replaced with a variable, the return
statement is replacable with "return olength + adjust;".

+ return t + (v >= PowersOfTen[t]);

I think it's better that if it were 't - (v < POT[t]) + 1; /*
log10(v) + 1 */'. At least we need an explanation of the
difference. (I'didn't checked the algorithm is truely right,
though.)

> void
> pg_lltoa(int64 value, char *a)
> {
..
> memcpy(a, "-9223372036854775808", 21);
..
> memcpy(a, "0", 2);

The lines need a comment like "/* length contains trailing '\0'
*/"

+ if (value >= 0)
...
+ else
+ {
+ if (value == PG_INT32_MIN)
+ memcpy(str, "-2147483648", 11);
+ return str + 11;
> }
+ *str++ = '-';
+ return pg_ltostr_zeropad(str, -value, minwidth - 1);

If then block of the if statement were (values < 0), we won't
need to reenter the functaion.

+ len = pg_ltoa_n(value, str);
+ if (minwidth <= len)
+ return str + len;
+
+ memmove(str + minwidth - len, str, len);

If the function had the parameters str with the room only for two
digits and a NUL, 2 as minwidth but 1000 as value, the function
would overrun the buffer. The original function just ignores
overflowing digits.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-09-18 07:31:40 Re: pgbench - allow to create partitioned tables
Previous Message Soumyadeep Chakraborty 2019-09-18 06:54:51 Don't codegen deform code for virtual tuples in expr eval for scan fetch