Quick Links

Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable
Date:	2008-01-29 10:55:38
Message-ID:	87r6g0vr2t.fsf@oxford.xeocode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

"Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:

> Sorry, but I don't grok this at all. Why the heck would we care if we have 2
> parts of the table perfectly clustered, because we started in the middle ?
> Surely our stats collector should recognize such a table as perfectly
> clustered. Does it not ? We are talking about one breakage in the readahead
> logic here, this should only bring the clustered property from 100% to some
> 99.99% depending on table size vs readahead window.

Well clusteredness is used or could be used for a few different heuristics,
not all of which this would be quite as well satisfied as readahead. But for
the most common application, namely trying to figure out whether index probes
for sequential ids will be sequential i/o or random i/o you're right.

Currently the statistic we use to estimate this is the correlation of the
column value with the physical location on disk. That's not a perfect metric
for estimating how much random i/o would be needed to scan the table in index
order though.

It would be great if Postgres picked up a serious statistics geek who could
pipe up in discussions like this with "how about using the Euler-Jacobian
Centroid" or some such thing. If you have any suggestions of what metric to
use and how to calculate the info we need from it that would be great.

One suggestion from a long way back was scanning the index and counting how
many times the item pointer moves backward to an earlier block. That would
still require a full index scan though. And it doesn't help for columns which
aren't indexed though I'm not sure we need this info for columns which aren't
indexed. It's also not clear how to interpolate from that the amount of random
access a given query would perform.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!

In response to

Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable at 2008-01-29 09:40:40 from Zeugswetter Andreas ADI SD

Responses

Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable at 2008-01-29 17:48:15 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Gregory Stark	2008-01-29 11:04:48	Re: Proposed patch: synchronized_scanning GUC variable
Previous Message	Zeugswetter Andreas ADI SD	2008-01-29 09:40:40	Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Gregory Stark	2008-01-29 11:04:48	Re: Proposed patch: synchronized_scanning GUC variable
Previous Message	Zeugswetter Andreas ADI SD	2008-01-29 09:40:40	Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable