Re: Subquery flattening causing sequential scan

From: Ondrej Ivanič pgsql-performance(at)postgresql(dot)org Re: Subquery flattening causing sequential scan 2011-12-27 22:21:00 CAM6mieL3XY25gGQacD7EYnWg9z-P2=kAEN_15xAQvic=LQTa7w@mail.gmail.com (view raw, whole thread or download thread mbox) 2011-12-27 17:29:14 from Jim Crate  2011-12-27 18:12:46 from Tom Lane   2011-12-27 22:21:00 from Ondrej Ivanič    2011-12-28 00:28:37 from Tom Lane   2011-12-28 08:30:59 from Віталій Тимчишин   2011-12-28 17:22:49 from Jim Crate  2012-01-11 00:19:00 from Robert Haas pgsql-performance
```Hi,

On 28 December 2011 05:12, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Possibly raising the stats target on emsg_messages would help.

In the function std_typanalyze() is this comment:

/*--------------------
* The following choice of minrows is based on the paper
* "Random sampling for histogram construction: how much is enough?"
* by Surajit Chaudhuri, Rajeev Motwani and Vivek Narasayya, in
* Proceedings of ACM SIGMOD International Conference on Management
* of Data, 1998, Pages 436-447.  Their Corollary 1 to Theorem 5
* says that for table size n, histogram size k, maximum relative
* error in bin size f, and error probability gamma, the minimum
* random sample size is
*      r = 4 * k * ln(2*n/gamma) / f^2
* Taking f = 0.5, gamma = 0.01, n = 10^6 rows, we obtain
*      r = 305.82 * k
* Note that because of the log function, the dependence on n is
* quite weak; even at n = 10^12, a 300*k sample gives <= 0.66
* bin size error with probability 0.99.  So there's no real need to
* scale for n, which is a good thing because we don't necessarily
* know it at this point.
*--------------------
*/

The question is why the parameter f is not exposed as a GUC? Sometimes
it could make sense to have few bins with better estimation (for same
r).

--
Ondrej Ivanic
(ondrej(dot)ivanic(at)gmail(dot)com)

```

pgsql-performance by date

 Next: From: Merlin Moncure Date: 2011-12-27 22:54:17 Subject: Re: Performance costs of various PL languages Previous: From: Pavel Stehule Date: 2011-12-27 22:20:11 Subject: Re: Performance costs of various PL languages