Re: Correlation in cost_index()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sean Chittenden <sean(at)chittenden(dot)org>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Correlation in cost_index()
Date: 2003-08-08 22:25:41
Message-ID: 1418.1060381541@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sean Chittenden <sean(at)chittenden(dot)org> writes:
> indexCorrelation is 1.0 for the 1st key in a multi-column index.

... only if it's perfectly correlated.

> As things stand, however, if a multi-column key is
> used, the indexCorrelation is penalized by the size of the number of
> keys found in the multi-column index. As things stand the qual
> user_id = 42, on a CLUSTER'ed multi-column index (user_id,utc_date)
> has an indexCorrelation of 0.5, when in fact the correlation is 1.0.

Right, in the perfectly-correlated case this calculation is clearly
wrong. However, what of cases where the first column shows good
correlation with the physical ordering, but the second does not?

The nasty part of this is that the correlation stat that ANALYZE
computed for the second column is of no value to us. Two examples:

X Y X Y

A A A B
A B A C
A C A A
B A B A
B B B C
B C B B
C A C C
C B C A
C C C B

In both cases ANALYZE will calculate correlation 1.0 for column X,
and something near zero for column Y. We would like to come out with
index correlation 1.0 for the left-hand case and something much less
(but, perhaps, not zero) for the right-hand case. I don't really see
a way to do this without actually examining the multi-column ordering
relationship during ANALYZE.

> I tossed a different index on my test table to see how well things
> fare with a low correlation, and this was a bit disturbing:

Seems like most of the error in that estimate has to do with the poor
rowcount estimation. There's very little percentage in trying to
analyze the effect of index correlation in examples where we don't have
the first-order stats correct ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2003-08-08 22:26:30 Re: [HACKERS] IS OF
Previous Message Sean Chittenden 2003-08-08 22:10:06 Re: Correlation in cost_index()