Re: Efficient output for integer types

From: David Fetter <david(at)fetter(dot)org>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Efficient output for integer types
Date: 2019-09-20 19:14:51
Message-ID: 20190920191450.GD31596@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote:
> Hello.
>
> At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter <david(at)fetter(dot)org> wrote in <20190918034201(dot)GX31596(at)fetter(dot)org>
> > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > > Folks,
> > > > >
> > > > > Please find attached a couple of patches intended to $subject.
> > > > >
> > > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > > pretty decent.
> > > >
> > > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > > help in any cases I was testing.
> > >
> > > Found a couple of "whiles" that should have been "ifs."
> >
> > Factored out some inefficient functions and made the guts use the more
> > efficient function.
>
> I'm not sure this is on the KISS principle, but looked it and
> have several random comments.
>
> +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
>
> I don't think that we are allowing that as project coding
> policy. It seems to have been introduced only to accept external
> code as-is.

Changed to fit current policy.

> - char str[23]; /* sign, 21 digits and '\0' */
> + char str[MAXINT8LEN];
>
> It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN
> can be so. I think MAXINT8LEN should be 20 and the definition
> should be str[MAXINT8LEN + 1].

Done.

> +static const char DIGIT_TABLE[200] = {
> + '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
>
> Wouldn't it be simpler if it were defined as a constant string?
>
> static const char DIGIT_TABLE[201] =
> "000102030405....19"
> "202122232425....39"
> ..

I thought this might be even clearer:

"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";

> +pg_ltoa_n(int32 value, char *a)
> ...
> + /* Compute the result string. */
> + while (value >= 100000000)
>
> We have only two degits above the value. Isn't the stuff inside
> the while a waste of cycles?

Changed the while to an if.

> + /* Expensive 64-bit division. Optimize? */
>
> I believe compiler treats such trivial optimizations. (concretely
> converts into shifts and subtractons if faster.)

Comments removed.

> + memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2);
>
> Maybe it'd be easy to read if 'a + olength - i' is a single variable.

Done.

> + i += adjust;
> + return i;
>
> If 'a + olength - i' is replaced with a variable, the return
> statement is replacable with "return olength + adjust;".

I'm not sure I understand this.

> + return t + (v >= PowersOfTen[t]);
>
> I think it's better that if it were 't - (v < POT[t]) + 1; /*
> log10(v) + 1 */'. At least we need an explanation of the
> difference. (I'didn't checked the algorithm is truely right,
> though.)

Comments added.

> > void
> > pg_lltoa(int64 value, char *a)
> > {
> ..
> > memcpy(a, "-9223372036854775808", 21);
> ..
> > memcpy(a, "0", 2);
>
> The lines need a comment like "/* length contains trailing '\0'
> */"

Comments added.

> + if (value >= 0)
> ...
> + else
> + {
> + if (value == PG_INT32_MIN)
> + memcpy(str, "-2147483648", 11);
> + return str + 11;
> > }
> + *str++ = '-';
> + return pg_ltostr_zeropad(str, -value, minwidth - 1);
>
> If then block of the if statement were (values < 0), we won't
> need to reenter the functaion.

This is a tail-call recursion, so it's probably optimized already.

> + len = pg_ltoa_n(value, str);
> + if (minwidth <= len)
> + return str + len;
> +
> + memmove(str + minwidth - len, str, len);
>
> If the function had the parameters str with the room only for two
> digits and a NUL, 2 as minwidth but 1000 as value, the function
> would overrun the buffer. The original function just ignores
> overflowing digits.

I believe the original was incorrect.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Attachment Content-Type Size
v7-0001-Make-int4-and-int8-operations-more-efficent.patch text/x-diff 13.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-09-20 19:56:57 Re: log bind parameter values on error
Previous Message Fabien COELHO 2019-09-20 18:55:47 Re: pgbench - allow to create partitioned tables