Quick Links

Re: cross column correlation revisted

From:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	Re: cross column correlation revisted
Date:	2010-07-14 11:21:19
Message-ID:	4C3D9DAF.8040807@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> However, the problem is how to represent and store the
> cross-correlation. For fields with low cardinality, like "gender" and
> boolean "breast-cancer-or-not" you can count the prevalence of all the
> different combinations, but that doesn't scale. Another often cited
> example is zip code + street address. There's clearly a strong
> correlation between them, but how do you represent that?
>
> For scalar values we currently store a histogram. I suppose we could
> create a 2D histogram for two columns, but that doesn't actually help
> with the zip code + street address problem.
In my head the neuron for 'principle component analysis' went on while
reading this. Back in college it was used to prepare input data before
feeding it into a neural network. Maybe ideas from PCA could be helpful?

regards,
Yeb Havinga

In response to

Re: cross column correlation revisted at 2010-07-14 10:40:50 from Heikki Linnakangas

Responses

Re: cross column correlation revisted at 2010-07-14 13:54:57 from Joshua Tolley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Yeb Havinga	2010-07-14 11:27:44	Re: five-key syscaches
Previous Message	PostgreSQL - Hans-Jürgen Schönig	2010-07-14 10:52:59	Re: cross column correlation revisted