Synchronized Scan update

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Synchronized Scan update
Date: 2007-02-27 23:14:18
Message-ID: 1172618058.10824.418.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I have found some interesting results from my tests with the
Synchronized Scan patch I'm working on.

The two benefits that I hope to achieve with the patch are:
(1) Better caching behavior with multiple sequential scans running in
parallel
(2) Faster sequential reads from disk and less seeking

I have consistently seen #1 to be true. There is still more testing to
be done (hopefully soon), but I haven't found a problem yet. And the
benefits I've seen are very substantial, which isn't hard, since in the
typical case, a large sequential scan will have 0% cache hit rate. These
numbers were retrieved using log_executor_stats=on.

#2 however, is a little trickier. IIRC, Tom was the first to point out
that the I/O system might not recognize that reads coming from different
processes are indeed one sequential read.

At first I never saw the problem actually happen, and I assumed that the
OS was being smart enough. However, recently I noticed this problem on
my home machine, which experienced great caching behavior but poor I/O
throughput (as measured by iostat). My home machine was using the Linux
CFQ io scheduler, and when I swapped the CFQ io scheduler for the
anticipatory scheduler (AS), it worked great. When I sent Josh my patch
(per his request) I mentioned the problem I experienced.

Then I started investigating, and found some mixed results. My test was
basically to use iostat (or zpool iostat) to measure disk throughput,
and N processes of "dd if=bigfile of=/dev/null" (started simultaneously)
to run the test. I consider the test to be "passed" if the additional
processes did not interfere (i.e. each process finished as though it
were the only one running). Of course, all tests were I/O bound.

My home machine (core 2 duo, single SATA disk, intel controller):
Linux/ext3/AS: passed
Linux/ext3/CFQ: failed
Linux/ext3/noop: passed
Linux/ext3/deadline: passed

Machine 2 (old thinkpad, IDE disk):
Solaris/UFS: failed
Solaris/ZFS: passed

Machine 3 (dell 2950, LSI PERC/5i controller, 6 SAS disks, RAID-10,
adaptive read ahead):
FreeBSD/UFS: failed

(I suspect the last test would be fine with read ahead always on, and it
may just be a problem with the adaptive read ahead feature)

There are a lot of factors involved, because several components of the
I/O system have the ability to reorder requests or read ahead, such as
the block layer and the controller.

The block request ordering isn't the only factor because Solaris/UFS
only orders the requests by cylinder and moves only in one direction
(i.e. looks like a simple elevator algorithm that isn't affected by
process id). At least, that's how I understand it.

Readahead can't be the only factor either because replacing the io
scheduler in Linux solved the problem, even when that replacement was
the noop scheduler.

Anyway, back to the patch, it looks like there are some complications if
you try to use it with the wrong combination of fs, io scheduler, and
controller.

The patch is designed for certain query patterns anyway, so I don't
think that this is a show-stopper. Given the better cache behavior, it
seems like it's really the job of the I/O system to get a single,
sequential stream of blocks efficiently.

The alternative would be to have a single block-reader process, which I
don't think we want to do. However, I/O systems don't really seem to
know how to handle multiple processes reading from the same file very
well.

Comments?

Regards,
Jeff Davis

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2007-02-27 23:48:13 Re: [HACKERS]
Previous Message Joshua D. Drake 2007-02-27 22:52:42 Re: [HACKERS]