Quick Links

cross column correlation revisted

From:	PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To:	PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject:	cross column correlation revisted
Date:	2010-07-14 10:12:49
Message-ID:	D0F6E707-701C-40C4-9F4B-D7D282AA0187@cybertec.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

hello everybody,

we are currently facing some serious issues with cross correlation issue.
consider: 10% of all people have breast cancer. we have 2 genders (50:50).
if i select all the men with breast cancer, i will get basically nobody - the planner will overestimate the output.
this is the commonly known problem ...

this cross correlation problem can be quite nasty in many many cases.
underestimated nested loops can turn joins into a never ending nightmare and so on and so on.

my ideas is the following:
what if we allow users to specifiy cross-column combinations where we keep separate stats?
maybe somehow like this ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

or ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = y.id2)

clearly we cannot store correlation for all combinations of all columns so we somehow have to limit it.

what is the general feeling about something like that?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

Responses

Re: cross column correlation revisted at 2010-07-14 10:40:50 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2010-07-14 10:40:50	Re: cross column correlation revisted
Previous Message	Markus Wanner	2010-07-14 08:28:43	Re: bg worker: overview