Re: PATCH: Extending the HyperLogLog API a bit

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Extending the HyperLogLog API a bit
Date: 2016-01-19 22:22:48
Message-ID: CAM3SWZSCfELXb1jocSPF-ZxXic1fSUtZj5oCgeKR6dnnhahrFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 19, 2016 at 2:03 PM, Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
> FWIW I've been considering adding APPROX_COUNT_DISTINCT() aggregate,
> similarly to what other databases (e.g. Vertica) have built-in. Now, that
> would not require the merge too, but we're currently baking support for
> 'combine' functions, and that's exactly what merge does.
>
> So why not just fix the bug?

You can go from the sparse representation to the dense representation,
so in principle you can merge two of our HLL states, if you are then
satisfied with having a new representation for the merged state. I
don't have time right now to do a full analysis of whether or not it's
possible to just fix the bug without doing all that, but I think it
might not be.

I think we benefit from the simplicity of the sparse representation.
So, in the absence of a clear justification for retaining
mergeHyperLogLog(), ripping it out seems best.

I also think that an expanded set of functionality would be required
for your APPROX_COUNT_DISTINCT() patch anyway, including support for
multiple representations (perhaps this isn't documented in your
APPROX_COUNT_DISTINCT(), but complete implementations seem to switch
from sparse to full at a certain point). So, ripping out
mergeHyperLogLog() doesn't really make that effort any more difficult.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomasz Rybak 2016-01-19 22:23:01 Re: pglogical_output - a general purpose logical decoding output plugin
Previous Message Brar Piening 2016-01-19 22:05:48 Infer INOUT Parameters from Frontend/Backend Protocol