Re: PATCH: adaptive ndistinct estimator v4

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: adaptive ndistinct estimator v4
Date: 2015-03-31 19:02:29
Message-ID: 551AEF45.7010700@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi all,

attached is v4 of the patch implementing adaptive ndistinct estimator.

I've been looking into the strange estimates, mentioned on 2014/12/07:

> values current adaptive
> ------------------------------
> 106 99 107
> 106 8 6449190
> 1006 38 6449190
> 10006 327 42441

I suspected this might be some sort of rounding error in the numerical
optimization (looking for 'm' solving the equation from paper), but
turns out that's not the case.

The adaptive estimator is a bit unstable for skewed distributions, that
are not sufficiently smooth. Whenever f[1] or f[2] was 0 (i.e. there
were no values occuring exactly once or twice in the sample), the result
was rather off.

The simple workaround for this was adding a fallback to GEE when f[1] or
f[2] is 0. GEE is another estimator described in the paper, behaving
much better in those cases.

With the current version, I do get this (with statistics_target=10):

values current adaptive
------------------------------
106 99 108
106 8 178
1006 38 2083
10006 327 11120

The results do change a bit based on the sample, but these values are a
good example of the values I'm getting.

The other examples (with skewed but smooth distributions) work as good
as before.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
ndistinct-estimator-v4.patch text/x-diff 13.1 KB
ndistinct.sql application/sql 3.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-03-31 19:07:08 Re: GUC context information in the document.
Previous Message Tom Lane 2015-03-31 18:37:17 Re: Bug #10432 failed to re-find parent key in index

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2015-03-31 19:52:33 Some performance testing?
Previous Message David G. Johnston 2015-03-31 17:03:10 Re: Weird CASE WHEN behaviour causing query to be suddenly very slow