Re: seq scan cache vs. index cache smackdown

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: seq scan cache vs. index cache smackdown
Date: 2005-02-15 18:41:26
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE4768A7@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>Josh Berkus wrote:
>> Now you can see why other DBMSs don't use the OS disk cache. There's
>> other
>> issues as well; for example, as long as we use the OS disk cache, we
>can't
>> eliminate checkpoint spikes, at least on Linux. No matter what we do
>with
>> the bgwriter, fsyncing the OS disk cache causes heavy system
>activity.
>
>MS SQL server uses the O/S disk cache...

No, it doesn't. They open all files with FILE_FLAG_WRITE_THROUGH and
FILE_FLAG_NO_BUFFERING. It scales the size of it dynamically with the
system, but it uses it's own buffer cache.

> the database is very tightly
>integrated with the O/S.

That it is.

>Write performance is one of the few things SQL
>server can do better than most other databases despite running on a
>mid-grade kernel and a low-grade filesystem...what does that say?
>ReadFileScatter() and ReadFileGather() were added to the win32 API
>specifically for SQL server...this is somewhat analogous to transaction
>based writing such as in Reisfer4.

(Those are ReadFileScatter and WriteFileGather)

I don't think that's correct either. Scatter/Gather I/O is used to SQL
Server can issue reads for several blocks from disks into it's own
buffer cache with a single syscall even if these buffers are not
sequential. It did make significant performance improvements when they
added it, though.

(For those not knowing - it's ReadFile/WriteFile where you pass an array
of "this many bytes to this address" as parameters)

> I'm not arguing ms sql server is
>better in any way, IIRC they are still using table locks (!).

Not at all. They use row level locks, escalated to page level, then
escalated to table level. Has been since 7.0. In <= 6.5 they had page
level and table level locks. I think possibly back in 4.2 (this is
16-bit days on OS/2) they had only table level locks, but that's a long
time ago.
They don't do MVCC, though.

(I'm not saying it's better either. At some things it is, at many it is
not)

//Magnus

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Christopher Browne 2005-02-15 19:04:49 Re: seq scan cache vs. index cache smackdown
Previous Message Greg Stark 2005-02-15 18:39:46 Re: seq scan cache vs. index cache smackdown