Skip site navigation (1) Skip section navigation (2)

Re: Collect frequency statistics for arrays

From: Noah Misch <noah(at)leadboat(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Nathan Boley <npboley(at)gmail(dot)com>,pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collect frequency statistics for arrays
Date: 2012-01-17 10:33:22
Message-ID: 20120117103322.GA19977@tornado.leadboat.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Tue, Jan 17, 2012 at 12:04:06PM +0400, Alexander Korotkov wrote:
> Thanks for your fixes to the patch. Them looks correct to me. I did some
> fixes in the patch. The proof of some concepts is still needed. I'm going
> to provide it in a few days.

Your further fixes look good.  Could you also answer my question about the
header comment of mcelem_array_contained_selec()?

/*
 * Estimate selectivity of "column <@ const" based on most common element
 * statistics.	Independent element occurrence would imply a particular
 * distribution of distinct element counts among matching rows.  Real data
 * usually falsifies that assumption.  For example, in a set of 1-element
 * integer arrays having elements in the range [0;10], element occurrences are
 * not independent.  If they were, a sufficiently-large set would include all
 * distinct element counts 0 through 11.  We correct for this using the
 * histogram of distinct element counts.
 *
 * In the "column @> const" and "column && const" cases, we usually have
 * "const" with low summary frequency of elements (otherwise we have
 * selectivity close to 0 or 1 correspondingly).  That's why the effect of
 * dependence related to distinct element counts distribution is negligible
 * there.  In the "column <@ const" case, summary frequency of elements is
 * high (otherwise we have selectivity close to 0).  That's why we should do
 * correction due to array distinct element counts distribution.
 */

By "summary frequency of elements", do you mean literally P_0 + P_1 ... + P_N?
If so, I can follow the above argument for "column && const" and "column <@
const", but not for "column @> const".  For "column @> const", selectivity
cannot exceed the smallest frequency among const elements.  A number of
high-frequency elements will drive up the sum of the frequencies without
changing the true selectivity much at all.

Thanks,
nm

In response to

Responses

pgsql-hackers by date

Next:From: Fujii MasaoDate: 2012-01-17 10:38:23
Subject: Re: Online base backup from the hot-standby
Previous:From: Simon RiggsDate: 2012-01-17 10:31:47
Subject: Re: WAL Restore process during recovery

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group