Re: Sort and index

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dave Held <dave(dot)held(at)arrayservicesgrp(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Sort and index
Date: 2005-05-12 18:54:48
Message-ID: 45678156iqld6sd5n63baovuvqj73palmr@email.aon.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 11 May 2005 16:15:16 -0500, "Jim C. Nasby" <decibel(at)decibel(dot)org>
wrote:
>> This is divided by the number of index columns, so the index correlation
>> is estimated to be 0.219.
>
>That seems like a pretty bad assumption to make.

Any assumption we make without looking at entire index tuples has to be
bad. A new GUC variable secondary_correlation introduced by my patch at
least gives you a chance to manually control the effects of additional
index columns.

>> In my tests I got much more plausible results with
>>
>> 1 - (1 - abs(correlation))^2
>
>What's the theory behind that?

The same as for csquared -- pure intuition. But the numbers presented
in http://archives.postgresql.org/pgsql-hackers/2002-10/msg00072.php
seem to imply that in this case my intiution is better ;-)

Actually above formula was not proposed in that mail. AFAIR it gives
results between p2 and p3.

>And I'd still like to know why correlation squared is used.

On Wed, 02 Oct 2002 18:48:49 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
|The indexCorrelation^2 algorithm was only a quick hack with no theory
|behind it :-(.

>It depends on the patches, since this is a production machine. Currently
>it's running 7.4.*mumble*,

The patch referenced in
http://archives.postgresql.org/pgsql-hackers/2003-08/msg00931.php is
still available. It doesn't touch too many places and should be easy to
review. I'm using it and its predecessors in production for more than
two years. Let me know, if the 74b1 version does not apply cleanly to
your source tree.

Servus
Manfred

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Sebastian Hennebrueder 2005-05-12 22:32:37 Optimize complex join to use where condition before join
Previous Message Josh Berkus 2005-05-12 18:51:44 Re: Partitioning / Clustering