Re: Yet another abort-early plan disaster on 9.3

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-03 23:41:50
Message-ID: 542F343E.40906@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 3.10.2014 02:54, Peter Geoghegan wrote:
> On Thu, Oct 2, 2014 at 12:56 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Yes, it's only intractable if you're wedded to the idea of a tiny,
>> fixed-size sample. If we're allowed to sample, say, 1% of the
>> table, we can get a MUCH more accurate n_distinct estimate using
>> multiple algorithms, of which HLL is one. While n_distinct will
>> still have some variance, it'll be over a much smaller range.
>
> I think that HyperLogLog, as a streaming algorithm, will always
> require that the entire set be streamed. This doesn't need to be a
> big deal - in the case of my abbreviated key patch, it appears to
> basically be free because of the fact that we were streaming
> everything anyway. It's a very cool algorithm, with fixed overhead
> and constant memory usage. It makes very useful guarantees around
> accuracy.

I think you're mixing two things here - estimating the number of
distinct values in a sample (which can be done very efficiently using
HLL) and estimating the number of distinct values in the whole table.
For that HLL is not usable, unless you process all the data.

Sadly HLL is rather incompatible with the usual estimators, because the
ones I'm aware of need to know the number of occurences for the distinct
values etc.

But couldn't we just piggyback this on autovacuum? One of the nice HLL
features is that it's additive - you can build "partial counters" for
ranges of blocks (say, a few MBs per range), and then merge them when
needed. By keeping the parts it's possible to rebuild it separately.

But maybe this idea is way too crazy ...

regards
Tomas

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-10-03 23:48:59 Re: Fixed xloginsert_locks for 9.4
Previous Message Gregory Smith 2014-10-03 23:39:25 Re: Fixed xloginsert_locks for 9.4

Browse pgsql-performance by date

  From Date Subject
Next Message Rodrigo Barboza 2014-10-04 17:31:09 Re: auto vaccum is dying
Previous Message Tomas Vondra 2014-10-03 23:30:34 Re: Yet another abort-early plan disaster on 9.3