Re: HyperLogLog.c and pg_leftmost_one_pos32()

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <rhaas(at)postgresql(dot)org>
Subject: Re: HyperLogLog.c and pg_leftmost_one_pos32()
Date: 2020-07-30 18:25:06
Message-ID: 3e6c57ddf94a2f1e149485de23d25459a6548067.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2020-07-30 at 19:16 +0200, Tomas Vondra wrote:
> > Essentially:
> > initHyperLogLog(&hll, 5)
> > for i in 0 .. one billion
> > addHyperLogLog(&hll, hash(i))
> > estimateHyperLogLog
> >
> > The numbers are the same regardless of bwidth.
> >
> > Before my patch, it takes about 15.6s. After my patch, it takes
> > about
> > 6.6s, so it's more than a 2X speedup (including the hash
> > calculation).
> >
>
> Wow. That's a huge improvements.

To be clear: the 2X+ speedup was on the tight loop test.

> How does the whole test (data + query) look like? Is it particularly
> rare / special case, or something reasonable to expect in practice?

The whole-query test was:

config:
shared_buffers=8GB
jit = off
max_parallel_workers_per_gather=0

setup:
create table t_1m_20(i int);
vacuum (freeze, analyze) t_1m_20;
insert into t_1m_20 select (random()*1000000)::int4
from generate_series(1,20000000);

query:
set work_mem='2048kB';
SELECT pg_prewarm('t_1m_20', 'buffer');

-- median of the three runs
select distinct i from t_1m_20 offset 10000000;
select distinct i from t_1m_20 offset 10000000;
select distinct i
from t_1m_20 offset 10000000;

results:
f2130e77 (before using HLL): 6787ms
f1af75c5 (before my recent commit): 7170ms
fd734f38 (master now): 6990ms

My previous results before I committed the patch (and therefore not on
the same exact SHA1s) were 6812, 7158, and 6898. So my most recent
batch of results is slightly worse, but the most recent commit
(fd734f38) still does show an improvement of a couple percent.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-07-30 18:44:10 Re: Threading in BGWorkers (!)
Previous Message Robert Haas 2020-07-30 17:59:13 Re: new heapcheck contrib module