Re: Database Kernels and O_DIRECT

From: James Rogers <jamesr(at)best(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Database Kernels and O_DIRECT
Date: 2003-10-15 06:31:12
Message-ID: BBB237C0.5B35%jamesr@best.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/14/03 8:26 PM, "Greg Stark" <gsstark(at)mit(dot)edu> wrote:
>
> All the more reason Postgres's view of the world should maybe be represented
> there. As it turns out Linus seems unsympathetic to the O_DIRECT approach and
> seems more interested in building a better kernel interface to control caching
> and i/o scheduling. Something that fits better with postgres's design than
> Oracle's.

This would certainly help Postgres as currently written, but it won't have
the theoretical performance headroom of what Oracle wants. A practical
kernel API is too narrow to be fully aware of and exploit database state.
And then there is the portability issue...

The way you want these kinds of things implemented in an operating system
kernel are somewhat orthogonal to how you want them implemented from the
perspective of a database kernel. Typical resource use cases for an
operating system and a database engine make pretty different assumptions and
the best you'll get is a compromise that doesn't optimize either.

Making additional optimizations to the OS kernel works great for Postgres
(on Linux, at least) because currently very little is optimized in this
regard. Basically Linus is doing some design optimization work for us. An
improvement, but kind of a mediocre one in the big scheme of things and not
terribly portable. If we suddenly wanted to optimize Postgres for
performance the way Oracle does, we would be a lot more keen on the O_DIRECT
approach.


> Actually I think it would be useful for the WAL. As I understand it there's no
> point caching the WAL and every write is going to get synced anyways so
> there's no point in buffering it either. The sooner the process can find out
> it's been synced the better. But I'm not really 100% up on the way the WAL is
> used so I could be wrong.

Aye, I think you may be correct.


> Bah. So Oracle has to live with whatever OS features VMS had 20 years ago. It
> has to reimplement whatever I/O scheduling or other strategies it wants.
> Rather than being the escape from the "lowest common denominator" it is in
> fact precisely the cause of it.

You appear to have completely missed the point.

The point of the abstraction layer is so they can optimize the hell out of
the database for every single platform they support without having to
rewrite a bunch of the database every time. The database kernel API is
BETTER AND MORE OPTIMAL than the operating system API. It allows them to use
whatever memory management scheme, I/O scheme, etc is the best for every
single platform. If "the best" happens to going to the native OS service,
then that is what they do, but most of the code doesn't need to know this if
the abstraction layer is well-designed.

Most of the code in a DBMS does not care where memory comes from, how its
managed, what the file system actually looks like, or how I/O is done. As
long as the behavior is the same from the database kernel API it is writing
to, it is all good. What this means from a practical standpoint is that you
don't *have* to use SysV IPC on every platform, or POSIX, or mmap, or
whatever. You can use whatever that particular platform likes as long it
can be mapped into the database kernel API, which tends to be at a high
enough level that just about *any* reasonable implementation of an OS API
can be mapped into it with quite a bit of optimization.

> You describe Postgres as if abstraction is a foreign concept to it. Much
> better to have well designed minimal abstractions for each of the resources
> needed, rather than trying to turn every OS you meet into the first one you
> met.

You have a serious misconception of what a database kernel is and looks
like.

A database kernel doesn't look like the OS kernel that is mapped to it. You
write a database kernel API that is idealized for database usage and
provides services specifically designed for the needs of a database. It is
a high-level API, not a mirror copy of standard OS APIs; if you did that,
you wouldn't have any room to do the database kernel implementation. You
then build an implementation of the API on the local system using whatever
operating system interfaces suit your fancy. The API is simple enough and
small enough that this isn't particularly difficult to do in a typical case.
And you can write a default kernel that is portable "as is" to most
operating systems.

There is some abstraction in Postgres and the database is well-written, but
it isn't written in a manner that makes it easy to swap out operating system
or API models. It is written to be portable at all levels. A database
kernel isn't necessarily required to be portable at the very lowest level,
but it is vastly more optimizable because you aren't forced into a narrow
set of choices for interfacing with the operating system.

Operating system APIs are not particularly well-suited for databases, and if
you force a database to adhere to operating system APIs directly, you end up
with a suboptimal situation almost every single time. You end with
implementations that you never would have done if you were targeting the
database for only that platform. Using a database kernel lets you make
platform specific optimizations and API selections without forcing most of
the database code to be aware of it.

Perhaps more to the point, who gives a damn what optimizations Linus puts in
the Linux kernel. What good does that do Postgres users on FreeBSD, or OSX,
or Windows? Abstracting a database engine to a set of operating system APIs
is never going to give stellar or even results across all platforms because
the operating system APIs usually aren't written so that you could write
your database optimally.

Theoretically, it is the difference between middling performance in the
typical case and highly optimal in just about every case. A database kernel
lets you use an operating system in the way it likes to be used rather than
using an API that you just happen to support.

Cheers,

-James Rogers
jamesr(at)best(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Rogers 2003-10-15 08:26:08 Re: Database Kernels and O_DIRECT
Previous Message Peter Eisentraut 2003-10-15 06:20:39 Re: postgres --help-config