Quick Links

Re: Cross-column statistics revisited

From:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
To:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc:	"Joshua Tolley" <eggyknap(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cross-column statistics revisited
Date:	2008-10-16 17:34:59
Message-ID:	603c8f070810161034o8333bf3ka08a3230578022f6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> I think the real question is: what other kinds of correlation might
> people be interested in representing?

Yes, or to phrase that another way: What kinds of queries are being
poorly optimized now and why?

I suspect that a lot of the correlations people care about are
extreme. For example, it's fairly common for me to have a table where
column B is only used at all for certain values of column A. Like,
atm_machine_id is usually or always NULL unless transaction_type is
ATM, or something. So a clause of the form transaction_type = 'ATM'
and atm_machine_id < 10000 looks more selective than it really is
(because the first half is redundant).

The other half of this is that bad selectivity estimates only matter
if they're bad enough to change the plan, and I'm not sure whether
cases like this are actually a problem in practice.

...Robert

In response to

Re: Cross-column statistics revisited at 2008-10-16 17:11:26 from Martijn van Oosterhout

Responses

Re: Cross-column statistics revisited at 2008-10-16 17:50:49 from Martijn van Oosterhout
Re: Cross-column statistics revisited at 2008-10-16 20:35:25 from Ron Mayer
Re: Cross-column statistics revisited at 2008-10-16 22:12:18 from Josh Berkus

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2008-10-16 17:37:57	Re: minimal update
Previous Message	Greg Stark	2008-10-16 17:31:48	Re: Cross-column statistics revisited