## Re: Cross-table statistics idea

From: "Jim C(dot) Nasby" Simon Riggs pgsql-hackers(at)postgresql(dot)org Re: Cross-table statistics idea 2006-09-27 16:35:48 20060927163548.GM19827@nasby.net (view raw or whole thread) 2006-09-27 02:27:28 from "Jim C(dot) Nasby"  2006-09-27 02:42:46 from Tom Lane  2006-09-27 11:30:43 from Simon Riggs   2006-09-27 16:35:48 from "Jim C(dot) Nasby" pgsql-hackers
```On Wed, Sep 27, 2006 at 12:30:43PM +0100, Simon Riggs wrote:
> On Tue, 2006-09-26 at 21:27 -0500, Jim C. Nasby wrote:
> > Since I don't recall any ideas ever having been thrown out on how to do
> > this...
> >
> > ISTM that we could gain additional insight on how many rows would likely
> > result from a join
>
> One thing we can do is to use cross-column relationships to improve the
> estimation of Ndistinct.
>
> If we have a table
> Order_line (PK orderId, lineNum)
>
> If we look at lineNum and see it has on average 10 values we can then
> use this information to compute that Ndistinct should be -0.1, i.e. the
> number of values is proportional to the number of rows with a factor of
> 10.
>
> Right now if there are more than 10 lineNums per orderId on average then
> we never decide that orderId is a scalable statistic.
>
> I propose adding a final step to ANALYZE that applies a cross-column
> rule after all columns have been analysed. If all except one column of a
> PK have very low Ndistinct we can use that to calculate a minimum number
> of Ndistinct for the column with a high number of values. If that
> minimum number is less than the Ndistinct estimate in isolation, then we
> overlay the new value.
>
> This is desirable because the estimation of Ndistinct is very sensitive
> to the number of matching rows in the sample, so Ndistinct estimates are
> usually very poor for large Ndistinct. The estimates for low Ndistinct
> are much better, so we can use them with a lower standard error to
> correct the in-isolation estimate of other columns.

But wouldn't overlaying the value screw us if we wanted to look up
something based on the unique field? (ie: if there was a line_id in
order_line and we wanted to look something up based on line_id).
--
Jim Nasby                                            jim(at)nasby(dot)net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

```

### pgsql-hackers by date

 Next: From: Rocco Altier Date: 2006-09-27 16:53:33 Subject: Buildfarm & cvsignore files Previous: From: Kris Jurka Date: 2006-09-27 16:30:21 Subject: Re: Buildfarm alarms