Re: Cross-column statistics revisited

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Joshua Tolley" <eggyknap(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Cross-column statistics revisited
Date: 2008-10-16 17:34:59
Message-ID: 603c8f070810161034o8333bf3ka08a3230578022f6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I think the real question is: what other kinds of correlation might
> people be interested in representing?

Yes, or to phrase that another way: What kinds of queries are being
poorly optimized now and why?

I suspect that a lot of the correlations people care about are
extreme. For example, it's fairly common for me to have a table where
column B is only used at all for certain values of column A. Like,
atm_machine_id is usually or always NULL unless transaction_type is
ATM, or something. So a clause of the form transaction_type = 'ATM'
and atm_machine_id < 10000 looks more selective than it really is
(because the first half is redundant).

The other half of this is that bad selectivity estimates only matter
if they're bad enough to change the plan, and I'm not sure whether
cases like this are actually a problem in practice.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2008-10-16 17:37:57 Re: minimal update
Previous Message Greg Stark 2008-10-16 17:31:48 Re: Cross-column statistics revisited