Re: ice-broker scan thread

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ice-broker scan thread
Date: 2005-11-29 03:53:36
Message-ID: Pine.LNX.4.58.0511291429330.18112@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 28 Nov 2005, Qingqing Zhou wrote:

>
> I am considering add an "ice-broker scan thread" to accelerate PostgreSQL
> sequential scan IO speed. The basic idea of this thread is just like the
> "read-ahead" method, but the difference is this one does not read the data
> into shared buffer pool directly, instead, it reads the data into file
> system cache, which makes the integration easy and this is unique to
> PostgreSQL.
>

MySQL, Oracle and others implement read-ahead threads to simulate async IO
'pre-fetching'. I've been experimenting with two ideas. The first is to
increase the readahead when we're doing sequential scans (see prototype
patch using posix fadvise attached). I've not got any hardware at the
moment which I can test this patch on but I am waiting on some dbt-3
results which should indicate whether fadvise is a good idea or a bad one.

The second idea is using posix async IO at key points within the system
to better parallelise CPU and IO work. There areas I think we could use
async IO are: during sequential scans, use async IO to do pre-fetching of
blocks; inside WAL, begin flushing WAL buffers to disk before we commit;
and, inside the background writer/check point process, asynchronously
write out pages and, potentially, asynchronously build new checkpoint segments.

The motivation for using async IO is two fold: first, the results of this
paper[1] are compelling; second, modern OSs support async IO. I know that
Linux[2], Solaris[3], AIX and Windows all have async IO and I presume that
all their rivals have it as well.

The fundamental premise of the paper mentioned above is that if the
database is busy, IO should be busy. With our current block-at-a-time
processing, this isn't always the case. This is why Qingqing's read-ahead
thread makes sense. My reason for mailing is, however, that the async IO
results are more compelling than the read ahead thread.

I haven't had time to prototype whether we can easily implement async IO
but I am planning to work on it in December. The two main goals will be to
a) integrate and utilise async IO, at least within the executor context,
and b) build a primitive kind of scheduler so that we stop prefetching
when we know that there are a certain number of outstanding IOs for a
given device.

Thanks,

Gavin

[1] http://www.vldb2005.org/program/paper/wed/p1116-hall.pdf
[2] http://lse.sourceforge.net/io/aionotes.txt
[3] http://developers.sun.com/solaris/articles/event_completion.html - I'm
fairly sure they have a posix AIO wrapper around these routines, but I
cannot see it documented anywhere :-(

Attachment Content-Type Size
fadvise.diff text/plain 10.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Kings-Lynne 2005-11-29 03:56:08 Re: ice-broker scan thread
Previous Message David Boreham 2005-11-29 03:50:43 Re: ice-broker scan thread