Re: PATCH: adaptive ndistinct estimator v4

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: adaptive ndistinct estimator v4
Date: 2015-04-15 06:45:55
Message-ID: CAMkU=1ySyCY1=8ZEeaEEPWD-9wn7ccXbQ6o=UJHU=3ZqA3-kxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Tue, Mar 31, 2015 at 12:02 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> wrote:

> Hi all,
>
> attached is v4 of the patch implementing adaptive ndistinct estimator.
>

Hi Tomas,

I have a case here where the adaptive algorithm underestimates ndistinct by
a factor of 7 while the default estimator is pretty close.

5MB file:

https://drive.google.com/file/d/0Bzqrh1SO9FcETU1VYnQxU2RZSWM/view?usp=sharing

# create table foo2 (x text);
# \copy foo2 from program 'bzcat ~/temp/foo1.txt.bz2'
# analyze verbose foo2;
INFO: analyzing "public.foo2"
INFO: "foo2": scanned 6021 of 6021 pages, containing 1113772 live rows and
0 dead rows; 30000 rows in sample, 1113772 estimated total rows
WARNING: ndistinct estimate current=998951.78 adaptive=135819.00

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-04-15 07:12:56 Re: FPW compression leaks information
Previous Message Abhijit Menon-Sen 2015-04-15 06:23:28 Re: initdb -S and tablespaces

Browse pgsql-performance by date

  From Date Subject
Next Message Qingqing Zhou 2015-04-16 18:49:25 Re: [PERFORM] pushing order by + limit to union subqueries
Previous Message Andreas Joseph Krogh 2015-04-15 01:46:34 Performance of vacuumlo