Re: multivariate statistics v14

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tomas(dot)vondra(at)2ndquadrant(dot)com
Cc: jeff(dot)janes(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multivariate statistics v14
Date: 2016-03-16 02:29:07
Message-ID: 20160316.112907.1269707811749756579.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Instead of simply multiplying the ndistinct estimate with selecticity,
> we instead use the formula for the expected number of distinct values
> observed in 'k' rows when there are 'd' distinct values in the bin
>
> d * (1 - ((d - 1) / d)^k)
>
> This is 'with replacements' which seems appropriate for the use, and it
> mostly assumes uniform distribution of the distinct values. So if the
> distribution is not uniform (e.g. there are very frequent groups) this
> may be less accurate than the current algorithm in some cases, giving
> over-estimates. But that's probably better than OOM.
> ---
> src/backend/utils/adt/selfuncs.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
> index f8d39aa..6eceedf 100644
> --- a/src/backend/utils/adt/selfuncs.c
> +++ b/src/backend/utils/adt/selfuncs.c
> @@ -3466,7 +3466,7 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
> /*
> * Multiply by restriction selectivity.
> */
> - reldistinct *= rel->rows / rel->tuples;
> + reldistinct = reldistinct * (1 - powl((reldistinct - 1) / reldistinct,rel->rows));

Why do you change "*=" style? I see no reason to change this.

reldistinct *= 1 - powl((reldistinct - 1) / reldistinct, rel->rows);

Looks better to me because it's shorter and cleaner.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-03-16 02:50:26 Re: syslog configurable line splitting behavior
Previous Message Peter Eisentraut 2016-03-16 02:17:05 Re: Relaxing SSL key permission checks