PATCH: Extending the HyperLogLog API a bit

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: PATCH: Extending the HyperLogLog API a bit
Date: 2015-12-31 20:48:45
Message-ID: 568594AD.5030101@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

while working on the bloom filters for hashjoins, I've started using the
HLL library committed as part of the sorting improvements for 9.5. I
propose adding two more functions to the API, which I think are quite
useful:

1) initHyperLogLogError(hyperLogLogState *cState, double error)

Instead of specifying bwidth (essentially the number of bits used
for addressing in the counter), this allows specifying the expected
error rate for the counter, which is

error_rate = 1.04 / sqrt(2^bwidth)

So for 5% we get bwidth=5, and so on. This makes the API a bit easier
the use, because there are pretty much no comments about the meaning
of bwidth, and the existing callers simply use 10 without details.

2) freeHyperLogLog(hyperLogLogState *cState)

I think it's a good idea to provide function "undoing" what init
does, i.e. freeing the internal memory etc. Currently that's trivial
to do, but perhaps we'll make the structure more complicated in the
future (albeit that might be unlikely).

FWIW I've already used this in the patch marrying hash joins and bloom
filters.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-12-31 20:49:30 Re: PATCH: Extending the HyperLogLog API a bit
Previous Message Tomas Vondra 2015-12-31 20:38:35 Re: WIP: bloom filter in Hash Joins with batches