Re: Yet another abort-early plan disaster on 9.3

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Yet another abort-early plan disaster on 9.3
Date: 2014-10-02 19:56:27
Message-ID: 542DADEB.2080103@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On 10/02/2014 02:30 AM, Peter Geoghegan wrote:
> On Thu, Oct 2, 2014 at 1:19 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> Having read papers on it, I believe the problem is intractable. Coding
>> is not the issue. To anyone: please prove me wrong, in detail, with
>> references so it can be coded.
>
> I think it might be close to intractable if you're determined to use a
> sampling model. HyperLogLog looks very interesting for n_distinct
> estimation, though. My abbreviated key patch estimates the cardinality
> of abbreviated keys (and original strings that are to be sorted) with
> high precision and fixed overhead. Maybe we can figure out a way to
> do opportunistic streaming of HLL. Believe it or not, the way I use
> HLL for estimating cardinality is virtually free. Hashing is really
> cheap when the CPU is bottlenecked on memory bandwidth.

Yes, it's only intractable if you're wedded to the idea of a tiny,
fixed-size sample. If we're allowed to sample, say, 1% of the table, we
can get a MUCH more accurate n_distinct estimate using multiple
algorithms, of which HLL is one. While n_distinct will still have some
variance, it'll be over a much smaller range.

The n_distinct algo we use in Postgres is specifically designed (by its
author) to choose the smallest reasonable number of distinct values
capable of producing the observed distribution. This made sense when we
added it because we didn't have query plans where underestimating
n_distinct produced a penalty. Now we do, and we ought to change algos.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-10-02 20:10:24 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message José Luis Tallón 2014-10-02 19:48:31 Re: DDL Damage Assessment

Browse pgsql-performance by date

  From Date Subject
Next Message George Neuner 2014-10-02 23:00:58 help: function failing
Previous Message Jeff Janes 2014-10-02 15:34:35 Re: auto vaccum is dying