Re: HyperLogLog.c and pg_leftmost_one_pos32()

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <rhaas(at)postgresql(dot)org>
Subject: Re: HyperLogLog.c and pg_leftmost_one_pos32()
Date: 2020-07-30 16:21:23
Message-ID: 1a112df8b6d82c2701c57f9457f872ecc20128b3.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2020-07-29 at 17:32 -0700, Peter Geoghegan wrote:
> How did you test this? What kind of difference are we talking about?

Essentially:
initHyperLogLog(&hll, 5)
for i in 0 .. one billion
addHyperLogLog(&hll, hash(i))
estimateHyperLogLog

The numbers are the same regardless of bwidth.

Before my patch, it takes about 15.6s. After my patch, it takes about
6.6s, so it's more than a 2X speedup (including the hash calculation).

As a part of HashAgg, when I test the worst case, it goes from a 4-5%
penalty to ~1% (within noise).

> I think that you should change back the rhs() variable names to match
> HyperLogLog upstream (as well as the existing rhs() comments).

Done.

> > I think it's overall an improvement, but
> > addHyperLogLog() itself seemed to show up as a cost, so it can hurt
> > spilled-but-still-in-memory cases. I'd also like to backpatch this
> > to
> > 13 (as I already did for 9878b643), if that's acceptable.
>
> I still wonder if it was ever necessary to add HLL to abbreviated
> keys. It only served to avoid some pretty narrow worse cases, at the
> expense of typical cases. Given that the existing users of HLL are
> pretty narrow, and given the importance of preserving the favorable
> performance characteristics of hash aggregate, I'm inclined to agree
> that it's worth backpatching to 13 now. Assuming it is a really
> measurable cost in practice.

Yes, the difference (at least in a tight loop, on my machine) is not
subtle. I went ahead and committed and backpatched.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-07-30 17:16:19 Re: HyperLogLog.c and pg_leftmost_one_pos32()
Previous Message Stephen Frost 2020-07-30 14:37:56 Re: Missing CFI in hlCover()?