cross column correlation revisted

From: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Subject: cross column correlation revisted
Date: 2010-07-14 10:12:49
Message-ID: D0F6E707-701C-40C4-9F4B-D7D282AA0187@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

hello everybody,

we are currently facing some serious issues with cross correlation issue.
consider: 10% of all people have breast cancer. we have 2 genders (50:50).
if i select all the men with breast cancer, i will get basically nobody - the planner will overestimate the output.
this is the commonly known problem ...

this cross correlation problem can be quite nasty in many many cases.
underestimated nested loops can turn joins into a never ending nightmare and so on and so on.

my ideas is the following:
what if we allow users to specifiy cross-column combinations where we keep separate stats?
maybe somehow like this ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (id = id2 AND id3=id4)

or ...

ALTER TABLE x SET CORRELATION STATISTICS FOR (x.id = y.id AND x.id2 = y.id2)

clearly we cannot store correlation for all combinations of all columns so we somehow have to limit it.

what is the general feeling about something like that?

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-07-14 10:40:50 Re: cross column correlation revisted
Previous Message Markus Wanner 2010-07-14 08:28:43 Re: bg worker: overview