Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Gerhard Wiesinger <gerhard(at)wiesinger(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Date: 2009-10-02 09:48:51
Message-ID: alpine.GSO.2.01.0910020521450.10008@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, 27 Sep 2009, Gerhard Wiesinger wrote:

> I think this is one of the most critical performance showstopper of
> PostgreSQL on the I/O side.

I wish, this is an easy problem compared to the real important ones that
need to be resolved. Situations where the OS is capable of faster
sequential I/O performance than PostgreSQL appears to deliver doing reads
are often caused by something other than what the person doing said
benchmarking believes they are. For example, the last time I thought I
had a smoking gun situation just like the one you're describing, it turns
out the background operation I didn't know was going on that slowed things
down were hint bit updates: http://wiki.postgresql.org/wiki/Hint_Bits

Background checkpoints can also cause this, typically if you set
checkpoint_segments really high and watch when they're happening you can
avoid that interfering with results too.

It's hard to isolate out the cause of issues like this. Since most people
seem to get something close to real disk speed from sequential scans when
measured properly, I would suggest starting with the assumption there's
something wrong with your test case rather than PostgreSQL. The best way
to do that is to construct a test case others can run that shows the same
problem on other systems using the database itself. The easiest way to
build one of those is using generate_series to create some bogus test
data, SELECT everything in there with \timing on, and then use the size of
the relation on disk to estimate MB/s.

Regardless, it's easy enough to build PostgreSQL with larger block sizes
if you think that really matters for your situation. You're never going
to see that in the mainstream version though, because there are plenty of
downsides to using larger blocks. And since the database doesn't actually
know where on disk things are at, it's not really in a good position to
make decisions about I/O scheduling anyway. More on that below.

> What's the current status of the patch of Gregory Stark? Any timeframes to
> integrate?

There needs to be a fairly major rearchitecting of how PostgreSQL handles
incoming disk I/O for that to go anywhere else, and I don't believe that's
expected to be ready in the near future.

> Does it also work for sequence scans? Any plans for a generic "multi block
> read count" solution?

There was a similar patch for sequential scans submitted by someone else
based on that work. It was claimed to help performance on a Linux system
with a rather poor disk I/O setup. No one else was able to replicate any
performance improvement using the patch though. As far as I've been able
to tell, the read-ahead logic being done by the Linux kernel and in some
hardware is already doing this sort of optimization for you on that OS,
whether or not your app knows enough to recognize it's sequentially
scanning the disk it's working against.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2009-10-02 09:56:30 Re: PostgreSQL reads each 8k block - no larger blocks are used - even on sequential scans
Previous Message Greg Smith 2009-10-02 09:17:44 Re: Limit of bgwriter_lru_maxpages of max. 1000?