Re: WIP: cross column correlation ...

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: Re: WIP: cross column correlation ...
Date: 2011-02-26 18:58:20
Message-ID: 20110226185819.GA22407@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 26, 2011 at 06:44:52PM +0000, Greg Stark wrote:
> 2011/2/26 PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>:
> > what we are trying to do is to explicitly store column correlations. so, a histogram for (a, b) correlation and so on.
>
> The problem is that we haven't figured out how to usefully store a
> histogram for <a,b>. Consider the oft-quoted example of a
> <city,postal-code> -- or <city,zip code> for Americans. A histogram
> of the tuple is just the same as a histogram on the city.

But there are cases where it can work. Frankly the example you mention
is odd because for we can't even build useful 1D histograms for <city>
and <zip code>, so the fact that 2D is hard is not surprising.

The histograms we do build work fine from > and <, just equality. The
2D will handle the same.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yeb Havinga 2011-02-26 19:48:16 Re: pg_basebackup and wal streaming
Previous Message Greg Stark 2011-02-26 18:44:52 Re: WIP: cross column correlation ...