Re: hanging for 30sec when checkpointing

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Peter Galbavy <peter(dot)galbavy(at)knowtion(dot)net>
Cc: Iain <iain(at)mst(dot)co(dot)jp>, Shane Wright <me(at)shanewright(dot)co(dot)uk>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: hanging for 30sec when checkpointing
Date: 2004-02-11 15:56:28
Message-ID: Pine.LNX.4.33.0402110849320.32376-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Wed, 11 Feb 2004, Peter Galbavy wrote:

> scott.marlowe wrote:
> > Oh, spreading misinformation isn't lying? You live in a different
> > world than I do.
>
> Again, I apologise if you took my comment so strongly. I can understand when
> something that someone works on so hard is critisised. OTOH however, your
> original post that I replied to, was very presumptive and over generalised.
> If you say 100% of all IDE drives with write caches from 100% of
> manufacturers are broken, then I simply do not believe you.

Find me one that doesn't lie. No one I know has found one yet. And we've
looked around. I'd guess though, that with the current state of IDE
drivers in the two common free unixes / clones, that if fsync was obeyed,
the throughput on writes would drop quite a bit, since those drivers are
pretty much one thing at a time oriented.

Your beliefs won't change the fact that no one has shown a single IDE
drive that doesn't lie. The fact that no one has shown a drive that
doesn't lie doesn't prove they all do either. But until I see one that
behaves properly, I'll err on the side of caution, and assume they all do.

And the above paragraph reads like part of the SQL 92 spec... :-)

> > Last year, I and several others on the pgsql lists ran a series of
> > tests to determine which drive subsystems could survive power off
> > tests. We ran the tests by initiating dozens or hundreds of
> > simultaneous transactions against a postgresql machine, then pulling
> > the plug in the middle.
>
> Thanks for that and the subsequent detail. What is still missing for me is
> the simple question; "What OSes were tested ?" and more specifically was the
> OS driver code compared for the SCSI and IDE subsystems ? It is highly
> possible and probable that underlying OS drivers for IDE and SCSI were
> written by different people and different attention to following standards
> documentation.

I tested linux, someone else tested on BSD. I do not know if any other
flavors of Unix were tested. It's in the archives, so you can search them
if you want to see.

We looked into it fairly closely, and if you read the comments in the IDE
code for both the BSD kernel and Linux kernel, you will see comments to
the fact that IDE drives basically all lie about fsync. And stuff about
how to get the manufacturers to make drives that don't.

Basically, the single threaded design of earlier IDE interfaces is still
pretty much what's implemented today, and the latest IDE interface specs
seem to allow some kind of command queueing, but no one has a driver to
take advantage of it for either BSD or linux. I'm a bit rusty on details,
it's been about 6 months or so. Searching for fsync and IDE on the BSD
and linux kernel mailing lists should bring up some interesting results.
Who knows, the 2.6 linux kernel or latest BSD kernels may finally be
addressing these issues.

> Is any of this writtent up anywhere with more details ?It would make very
> interesting reading.

Other than in the archives, I don't think so.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message AlmawElias Fantahun 2004-02-11 16:00:01 remove me from the list
Previous Message scott.marlowe 2004-02-11 15:48:48 Re: IDE and write cache