Re: Slim down integer formatting

From: David Fetter <david(at)fetter(dot)org>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slim down integer formatting
Date: 2021-07-28 02:25:43
Message-ID: 20210728022542.GM18391@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 28, 2021 at 01:17:43PM +1200, David Rowley wrote:
> On Wed, 28 Jul 2021 at 01:44, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > So how much faster is it than the original?
>
> I only did some very quick tests. They're a bit noisey. The results
> indicate an average speedup of 1.7%, but the noise level is above
> that, so unsure.
>
> create table a (a int);
> insert into a select a from generate_series(1,1000000)a;
> vacuum freeze a;
>
> bench.sql: copy a to '/dev/null';
>
> master @ 93a0bf239
> drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
> latency average = 153.815 ms
> latency average = 152.955 ms
> latency average = 147.491 ms
>
> master + v2 patch
> drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 60 postgres
> latency average = 144.749 ms
> latency average = 151.525 ms
> latency average = 150.392 ms

Thanks for testing this! I got a few promising results early on with
-O0, and the technique seemed like a neat way to do things.

I generated a million int4s intended to be uniformly distributed
across the range of int4, and similarly across int8.

int4:
patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 362.149 ms 359.933 ms
latency stddev: 3.44 ms 3.40 ms

int8:
patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 434.944 ms 422.270 ms
latency stddev: 3.23 ms 4.02 ms

when compiled with -O2:

int4:
patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 167.262 ms 148.673 ms
latency stddev: 6.26 ms 1.28 ms

i.e. it was actually slower, at least over the 10 runs I did.

I assume that "uniform distribution across the range" is a bad case
scenario for ints, but I was a little surprised to measure worse
performance. Interestingly, what I got for int8s generated to be
uniform across their range was

int8:
patch 6feebcb6b44631c3dc435e971bd80c2dd218a5ab
latency average: 171.737 ms 174.013 ms
latency stddev: 1.94 ms 6.84 ms

which doesn't look like a difference to me.

Intuitively, I'd expect us to get things in the neighborhood of 1 a
lot more often than things in the neighborhood of 1 << (30 or 60). Do
we have some idea of the distribution, or at least of the distribution
family, that we should expect for ints?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-07-28 02:31:25 Re: Out-of-memory error reports in libpq
Previous Message Andres Freund 2021-07-28 02:23:42 Re: Autovacuum on partitioned table (autoanalyze)