Skip site navigation (1) Skip section navigation (2)

Re: seq scan cache vs. index cache smackdown

From: PFC <lists(at)boutiquenumerique(dot)com>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>,"Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>,"Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: seq scan cache vs. index cache smackdown
Date: 2005-02-15 21:55:32
Message-ID: opsl9duu0vth1vuj@musicbox (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
	In the 'wishful hand waving' department :

	read index -> determine (tuple id,page) to hit in table -> for each of  
these, tell the OS 'I'm gonna need these' via a NON BLOCKING call. Non  
blocking because you feed the information to the OS as you read the index,  
streaming it.

	Meanwhile, the OS accumulates the requests in an internal FIFO,  
reorganizes them according to the order best suited to good disk head  
movements, then reads them in clusters, and calls a callback inside the  
application when it has data available. Or the application polls it once  
in a while to get a bucketload of pages. The 'I'm gonna need these()'  
syscall would also sometimes return  'hey, I'm full, read the pages I have  
here waiting for you before asking for new ones'.

	A flag would tell the OS if the application wanted the results in any  
order, or with order preserved.
	Without order preservation, if the application has requested twice the  
same page with different tuple id's, the OS would call the callback only  
once, giving it a list of the tuple id's associated with that page.

	It involves a tradeoff between memory and performance : as the size of  
the FIFO increases, likelihood of good contiguous disk reading increases.  
However, the memory structure would only contain page numbers and tuple  
id's, so it could be pretty small.

	Returning the results in-order would also need more memory.

	It could be made very generic if instead of 'tuple id' you read 'opaque  
application data', and instead of 'page' you read '(offset, length)'.

	This structure actually exists already in the Linux Kernel, it's called  
the Elevator or something, but it works for scheduling reads between  

	You can also read 'internal not yet developed postgres cache manager'  
instead of OS if you don't feel like talking kernel developers into  
implementing this thing.

> (Those are ReadFileScatter and WriteFileGather)

In response to


pgsql-performance by date

Next:From: Greg StarkDate: 2005-02-16 04:19:25
Subject: Re: seq scan cache vs. index cache smackdown
Previous:From: lcham02Date: 2005-02-15 19:15:55
Subject: disagreeing query planners

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group