Re: Index Scans become Seq Scans after VACUUM ANALYSE

From: Curt Sampson <cjs(at)cynic(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "J(dot) R(dot) Nield" <jrnield(at)usol(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Michael Loftis <mloftis(at)wgops(dot)com>, mlw <markw(at)mohawksoft(dot)com>, PostgreSQL Hacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index Scans become Seq Scans after VACUUM ANALYSE
Date: 2002-06-23 16:10:26
Message-ID: Pine.NEB.4.43.0206240057070.2100-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 23 Jun 2002, Tom Lane wrote:

> Curt Sampson <cjs(at)cynic(dot)net> writes:
> > This should also allow us to disable completely the ping-pong writes
> > if we have a disk subsystem that we trust.
>
> If we have a disk subsystem we trust, we just disable fsync on the
> WAL and the performance issue largely goes away.

No, you can't do this. If you don't fsync(), there's no guarantee
that the write ever got out of the computer's buffer cache and to
the disk subsystem in the first place.

> I concur with Bruce: the reason we keep page images in WAL is to
> minimize the number of places we have to fsync, and thus the amount of
> head movement required for a commit.

An fsync() does not necessarially cause head movement, or any real
disk writes at all. If you're writing to many external disk arrays,
for example, the fsync() ensures that the data are in the disk array's
non-volatile or UPS-backed RAM, no more. The array might hold the data
for quite some time before it actually writes it to disk.

But you're right that it's faster, if you're going to write out changed
pages and have have the ping-pong file and the transaction log on the
same disk, just to write out the entire page to the transaction log.

So what we would really need to implement, if we wanted to be more
efficient with trusted disk subsystems, would be the option of writing
to the log only the changed row or changed part of the row, or writing
the entire changed page. I don't know how hard this would be....

> > Well, whether or not there's a cheap way depends on whether you consider
> > fsync to be cheap. :-)
>
> It's never cheap :-(

Actually, with a good external RAID system with non-volatile RAM,
it's a good two to four orders of magnitude cheaper than writing to a
directly connected disk that doesn't claim the write is complete until
it's physically on disk. I'd say that it qualifies as at least "not
expensive." Not that you want to do it more often than you have to
anyway....

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message J. R. Nield 2002-06-23 17:57:19 Re: Index Scans become Seq Scans after VACUUM ANALYSE
Previous Message Tom Lane 2002-06-23 16:01:39 Re: Suggestions for implementing IS DISTINCT FROM?