Re: Read/Write block sizes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: Guy Thornley <guy(at)esphion(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Steve Poe <spoe(at)sfnet(dot)cc>, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: Read/Write block sizes
Date: 2005-08-24 05:56:44
Message-ID: 21700.1124863004@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> On Wed, 2005-08-24 at 17:20 +1200, Guy Thornley wrote:
>> Dont forget that already in postgres, you have a process per connection, and
>> all the processes take care of their own I/O.

> That's the problem. Instead you want 1 or 4 or 10 i/o slaves
> coordinating the I/O of all the backends optimally. For instance, with
> synchronous scanning.

And why exactly are we going to do a better job of I/O scheduling than
the OS itself can do?

There's a fairly basic disconnect in viewpoint involved here. The
old-school viewpoint (as embodied in Oracle and a few other DBMSes)
is that the OS is too stupid to be worth anything, and the DB should
bypass the OS to the greatest extent possible, doing its own caching,
disk space layout, I/O scheduling, yadda yadda. That might have been
defensible twenty-odd years ago when Oracle was designed. Postgres
prefers to lay off to the OS anything that the OS can do well --- and
that definitely includes caching and I/O scheduling. There are a whole
lot of smart people working on those problems at the OS level. Maybe we
could make marginal improvements on their results after spending a lot
of effort reinventing the wheel ... but our time will be repaid much
more if we work at levels that the OS cannot have knowledge of, such as
join planning and data statistics.

There are some things we could do to reduce the impedance between us and
the OS --- for instance, the upthread criticism that a seqscan asks the
OS for only 8K at a time is fair enough. But that doesn't translate
to a conclusion that we should schedule the I/O instead of the OS.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jeffrey W. Baker 2005-08-24 06:22:52 Re: Read/Write block sizes
Previous Message Jeffrey W. Baker 2005-08-24 05:25:21 Re: Read/Write block sizes