|From:||Gregory Stark <stark(at)enterprisedb(dot)com>|
|To:||pgsql-hackers list <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||There's random access and then there's random access|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
Recently there was a post on -performance about a particular case where
Postgres doesn't make very good use of the I/O system. This is when you try to
fetch many records spread throughout a table in random order.
Currently Postgres reads each record as needed and processes it. This means
even if you have a large raid array you get no benefit from it since you're
limited by the latency of each request. The raid array might let you run more
queries simultaneously but not improve the response time of a single query.
But in most cases, as in the use case in the email message above, we can do
substantially better. We can arrange to issue all the read requests without
blocking, then process the blocks either as they come in or in the order we
want blocking until they're actually satisfied. Handling them as they come in
is in theory more efficient but either way I would expect to see more or less
a speedup nearly equal to the number of drives in the array. Even on a single
drive it should slightly improve performance as it allows us to do some CPU
work while the I/O requests are pending.
The two interfaces I'm aware of for this are posix_fadvise() and libaio. I've
run tests with a synthetic benchmark which generates a large file then reads a
random selection of blocks from within it using either synchronous reads like
we do now or either of those interfaces. I saw impressive speed gains on a
machine with only three drives in a raid array. I did this a while ago so I
don't have the results handy. I'll rerun the tests again and post them.
I think this will be easiest to do for bitmap index scans. Since we gather up
all the pages we'll need before starting the heap scan we can easily skim
through them, issue posix_fadvises for at least a certain number ahead of the
actual read point and then proceed with the rest of the scan unchanged.
For regular index scans I'm not sure how easy it will be to beat them into
doing this but I suspect it might not be too hard to at least prefetch the
tuples in the page-at-a-time buffer. That's probably safer too since for such
scans we're more likely to not actually read all the results anyways; there
could be a limit or something else above which will stop us.
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!
|Next Message||Douglas McNaught||2007-12-02 13:27:36||Re: There's random access and then there's random access|
|Previous Message||Neil Conway||2007-12-02 08:29:18||Re: String encoding during connection "handshake"|