Quick Links

new correlation metric

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	npboley(at)gmail(dot)com
Subject:	new correlation metric
Date:	2008-10-26 08:38:02
Message-ID:	1225010283.21241.33.camel@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Currently, we use correlation to estimate the I/O costs of an index
scan. However, this has some problems:

http://archives.postgresql.org/pgsql-hackers/2008-01/msg01084.php

I worked with Nathan Boley to come up with what we think is a better
metric for measuring this cost. It is based on the number of times in
the ordered sample that you have to physically backtrack (i.e. the data
value increases, but the physical position is earlier).

For example, if the table's physical order is

6 7 8 9 10 1 2 3 4 5

then the new metric will see one backtrack -- after reading 5 it
backtracks to the beginning to read 6. The old code would have produced
a correlation near -0.5. The new code will yield a "correlation" of 8/9.
The old code would have produced a run_cost of 1/4*min_IO_cost +
3/4*max_IO_cost, while the new code produces a run_cost of
8/9*min_IO_cost + 1/9*max_IO_cost.

Attached is a patch. We replaced "correlation", but left the name the
same (even though that's now a misnomer). We can change it if needed.

Comments?

Regards,
Jeff Davis

Attachment	Content-Type	Size
correlation.diff.gz	application/x-gzip	1.8 KB

Responses

Re: new correlation metric at 2008-10-26 13:49:44 from Martijn van Oosterhout
Re: new correlation metric at 2008-10-27 07:27:44 from Ron Mayer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2008-10-26 10:54:28	WIP: default values for function parameters
Previous Message	Robert Haas	2008-10-26 03:29:49	BufferAccessStrategy for bulk insert