Re: new correlation metric

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, npboley(at)gmail(dot)com
Subject: Re: new correlation metric
Date: 2008-10-26 15:51:02
Message-ID: 11255.1225036262@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Sun, Oct 26, 2008 at 01:38:02AM -0700, Jeff Davis wrote:
>> I worked with Nathan Boley to come up with what we think is a better
>> metric for measuring this cost.

> I think the code is in the right direction, but I think want you want
> is some kind of estimate of "given I've looked for tuple X, how many
> tuples in the next k pages are near this one".

ISTM that some experimental studies would be required to justify any
proposal in this area.

Having said that ... one of the things I never liked about the existing
code is that it pays no attention to block-boundary effects. It doesn't
matter to an indexscan how badly tuples within a single block are
misordered --- what matters is how many block reads you have to do.
So I think that any changes here ought to include measuring the
correlation on the basis of block numbers not tuple numbers. But what's
not too clear to me is whether our sampling methods would mess that up.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-10-26 15:57:47 Re: new correlation metric
Previous Message Jeff 2008-10-26 15:47:24 Re: again...