Re: multivariate statistics (v25)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: "Sven R(dot) Kunze" <srkunze(at)mail(dot)de>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: multivariate statistics (v25)
Date: 2017-04-05 09:41:29
Message-ID: a80cbb70-ea48-0367-9a40-a5cb6484046e@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/05/2017 08:41 AM, Sven R. Kunze wrote:
> Thanks Tomas and David for hacking on this patch.
>
> On 04.04.2017 20:19, Tomas Vondra wrote:
>> I'm not sure we still need the min_group_size, when evaluating
>> dependencies. It was meant to deal with 'noisy' data, but I think it
>> after switching to the 'degree' it might actually be a bad idea.
>>
>> Consider this:
>>
>> create table t (a int, b int);
>> insert into t select 1, 1 from generate_series(1, 10000) s(i);
>> insert into t select i, i from generate_series(2, 20000) s(i);
>> create statistics s with (dependencies) on (a,b) from t;
>> analyze t;
>>
>> select stadependencies from pg_statistic_ext ;
>> stadependencies
>> --------------------------------------------
>> [{1 => 2 : 0.333344}, {2 => 1 : 0.333344}]
>> (1 row)
>>
>> So the degree of the dependency is just ~0.333 although it's obviously
>> a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The
>> reason is that we discard 2/3 of rows, because those groups are only a
>> single row each, except for the one large group (1/3 of rows).
>
> Just for me to follow the comments better. Is "dependency" roughly the
> same as when statisticians speak about " conditional probability"?
>

No, it's more 'functional dependency' from relational normal forms.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2017-04-05 09:54:46 Re: UPDATE of partition key
Previous Message Tomas Vondra 2017-04-05 09:37:40 Re: strange parallel query behavior after OOM crashes