In the 'wishful hand waving' department :
read index -> determine (tuple id,page) to hit in table -> for each of
these, tell the OS 'I'm gonna need these' via a NON BLOCKING call. Non
blocking because you feed the information to the OS as you read the index,
Meanwhile, the OS accumulates the requests in an internal FIFO,
reorganizes them according to the order best suited to good disk head
movements, then reads them in clusters, and calls a callback inside the
application when it has data available. Or the application polls it once
in a while to get a bucketload of pages. The 'I'm gonna need these()'
syscall would also sometimes return 'hey, I'm full, read the pages I have
here waiting for you before asking for new ones'.
A flag would tell the OS if the application wanted the results in any
order, or with order preserved.
Without order preservation, if the application has requested twice the
same page with different tuple id's, the OS would call the callback only
once, giving it a list of the tuple id's associated with that page.
It involves a tradeoff between memory and performance : as the size of
the FIFO increases, likelihood of good contiguous disk reading increases.
However, the memory structure would only contain page numbers and tuple
id's, so it could be pretty small.
Returning the results in-order would also need more memory.
It could be made very generic if instead of 'tuple id' you read 'opaque
application data', and instead of 'page' you read '(offset, length)'.
This structure actually exists already in the Linux Kernel, it's called
the Elevator or something, but it works for scheduling reads between
You can also read 'internal not yet developed postgres cache manager'
instead of OS if you don't feel like talking kernel developers into
implementing this thing.
> (Those are ReadFileScatter and WriteFileGather)
In response to
pgsql-performance by date
|Next:||From: Greg Stark||Date: 2005-02-16 04:19:25|
|Subject: Re: seq scan cache vs. index cache smackdown|
|Previous:||From: lcham02||Date: 2005-02-15 19:15:55|
|Subject: disagreeing query planners|