Re: Cross-column statistics revisited

From: "Nathan Boley" <npboley(at)gmail(dot)com>
To: "Joshua Tolley" <eggyknap(at)gmail(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Subject: Re: Cross-column statistics revisited
Date: 2008-10-19 17:09:47
Message-ID: 6fa3b6e20810191009n8fb56b7xb34b4e783c7e0ff0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I still need to go through backend/utils/adt/selfuncs.c
> to figure out exactly how we use the one-dimensional values.
>

Here's a page that helped me figure all this out.

http://www.postgresql.org/docs/8.1/static/planner-stats-details.html

>>
>> 2) Do we want to fold the MCV's into the dependence histogram? That
>> will cause problems in our copula approach but I'd hate to have to
>> keep an N^d histogram dependence relation in addition to the copula.
>
> Yeah, if we're already trying to figure out how to compress copulae,
> having also to compress MCV matrices seems painful and error-prone.
> But I'm not sure why it would cause problems to keep them in the
> copula -- is that just because we are most interested in the copula
> modeling the parts of the distribution that are most sparsely
> populated?
>

The problem I was thinking of is that we don't currently store the
true marginal distribution. As it stands, histograms only include non
mcv values. So we would either need to take the mcv's separately (
which would assume independence between mcv's and non-mcv values ) or
store multiple histograms.

>> 4) How will this approach deal with histogram buckets that have
>> scaling count sizes ( ie -0.4 )?
>
> I'm not sure what you mean here.
>

That was more a note to myself, and should have been numbered 3.5.
ndistinct estimates currently start to scale after a large enough
row/ndistinct ratio. If we try to model ndistinct, we need to deal
with scaling ndistinct counts somehow. But that's way off in the
future, it was probably pointless to mention it.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Douglas McNaught 2008-10-19 17:27:23 Re: Lisp as a procedural language?
Previous Message Volkan YAZICI 2008-10-19 06:24:58 Re: Lisp as a procedural language?