Re: proposal : cross-column stats

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal : cross-column stats
Date: 2010-12-19 20:21:43
Message-ID: 1292790103.1193.9461.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2010-12-13 at 10:38 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Sun, Dec 12, 2010 at 9:16 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
> >> The proposed solution is based on contingency tables, built for selected
> >> groups of columns (not for each possible group). And the contingency
> >> table gives you the ability to estimate the probabilities needed to
> >> compute the selectivity. Or am I missing something?
>
> > Well, I'm not real familiar with contingency tables, but it seems like
> > you could end up needing to store a huge amount of data to get any
> > benefit out of it, in some cases.
>
> The reason that this wasn't done years ago is precisely that nobody's
> figured out how to do it with a tolerable amount of stats data and a
> tolerable amount of processing time (both at ANALYZE time and during
> query planning). It's not hard to see what we'd ideally like to do;
> it's getting from there to something useful in production that's hard.

I think we have to face up to the fact that attempting to derive
meaningful cross-column stats will require larger sample sizes.

If we collect everything we're going to have ~10^9 stats slots with
default stats_target 100 and a 100 column table.

We should disconnect sample size from histogram size, and we need to
make the initial column pairings vastly fewer than all combinations.
Manual specification seems like it will be required for the cases where
we decide not to include it automatically, so it seems we'll need manual
specification anyway. In that case, we should do manual specification
first.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 3dmashup 2010-12-19 20:39:27 Re: Amazon now supporting GPU focused EC2 instances
Previous Message Magnus Hagander 2010-12-19 20:10:47 Re: pg_ctl and port number detection