Quick Links

Re: Multi-Dimensional Histograms

From:	Nathan Boley <npboley(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Multi-Dimensional Histograms
Date:	2009-06-30 00:17:00
Message-ID:	6fa3b6e20906291717n1596ecc4qe7165d16018f4dfe@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jun 29, 2009 at 3:43 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> David Fetter <david(at)fetter(dot)org> writes:
>> On Mon, Jun 29, 2009 at 01:28:01PM -0700, Nathan Boley wrote:
>>> ... They dismiss
>>> singular value decomposition and the discrete wavelet transform as
>>> being too parametric ( which is silly, IMHO )
>
>> Should we have a separate discussion about eigenvalues? Wavelets?
>
> I think it'd be a short discussion: what will you do with non-numeric
> datatypes? We probably don't really want to assume anything stronger
> than that the datatype has a total ordering.

Well, in the general case, we could use their ranks.

At the end of the day, we cant do any dimension reduction unless the
ordering encodes some sort of useful information, and the data type
being in R^n is certainly no guarantee. Consider, for instance, the
cross correlation of zip-codes and area codes - you would really want
to order those by some geographic relation. I think that is why
cross-column stats is so hard in the general case.

That being said, for geographic data in particular, PCA or similar
could work well.

-Nathan

In response to

Re: Multi-Dimensional Histograms at 2009-06-29 22:43:35 from Tom Lane

Responses

Re: Multi-Dimensional Histograms at 2009-06-30 02:22:15 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2009-06-30 00:51:17	Re: pre-proposal: permissions made easier
Previous Message	Ron Mayer	2009-06-30 00:15:04	Re: Query progress indication - an implementation