Re: HyperLogLog.c and pg_leftmost_one_pos32()

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <rhaas(at)postgresql(dot)org>
Subject: Re: HyperLogLog.c and pg_leftmost_one_pos32()
Date: 2020-07-30 00:32:55
Message-ID: CAH2-WzkGvDKVDo+0YvfvZ+1CE=iCi88DCOGFF3i1hTGGaxcKPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 29, 2020 at 10:08 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Is there a reason that HyperLogLog doesn't use pg_leftmost_one_pos32()?

Yes: HyperLogLog predates pg_leftmost_one_pos32().

> I tried the following patch and some brief performance tests seem to
> show an improvement.

Makes sense.

How did you test this? What kind of difference are we talking about? I
ported this code from the upstream C++ as part of the original
abbreviated keys commit. Note that the cardinality of abbreviated keys
are displayed when you set "trace_sort = on".

> This came up because my recent commit 9878b643 uses HLL for estimating
> the cardinality of spill files, which solves a few annoyances with
> overpartitioning[1].

I think that you should change back the rhs() variable names to match
HyperLogLog upstream (as well as the existing rhs() comments).

> I think it's overall an improvement, but
> addHyperLogLog() itself seemed to show up as a cost, so it can hurt
> spilled-but-still-in-memory cases. I'd also like to backpatch this to
> 13 (as I already did for 9878b643), if that's acceptable.

I still wonder if it was ever necessary to add HLL to abbreviated
keys. It only served to avoid some pretty narrow worse cases, at the
expense of typical cases. Given that the existing users of HLL are
pretty narrow, and given the importance of preserving the favorable
performance characteristics of hash aggregate, I'm inclined to agree
that it's worth backpatching to 13 now. Assuming it is a really
measurable cost in practice.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2020-07-30 00:40:08 Re: Default setting for enable_hashagg_disk
Previous Message Fujii Masao 2020-07-29 23:51:36 Re: [PATCH] Initial progress reporting for COPY command