Re: Cross-column statistics revisited

From: Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cross-column statistics revisited
Date: 2008-10-16 17:31:48
Message-ID: B71B9E9E-3F8D-48B2-9D99-A342AB043322@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

[sorry for top osting - dam phone]

It's pretty straightforward to to a chi-squared test on all the pairs.
But that tells you that the product is more likely to be wrong. It
doesn't tell you whether it's going to be too high or too low...

greg

On 16 Oct 2008, at 07:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
>> I think you need to go a step back: how are you going to use this
>> data?
>
> The fundamental issue as the planner sees it is not having to assume
> independence of WHERE clauses. For instance, given
>
> WHERE a < 5 AND b > 10
>
> our current approach is to estimate the fraction of rows with a < 5
> (using stats for a), likewise estimate the fraction with b > 10
> (using stats for b), and then multiply these fractions together.
> This is correct if a and b are independent, but can be very bad if
> they aren't. So if we had joint statistics on a and b, we'd want to
> somehow match that up to clauses for a and b and properly derive
> the joint probability.
>
> (I'm not certain of how to do that efficiently, even if we had the
> right stats :-()
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2008-10-16 17:34:59 Re: Cross-column statistics revisited
Previous Message Tom Lane 2008-10-16 17:20:30 Re: Cross-column statistics revisited