Re: Checkpoint cost, looks like it is WAL/CRC

From: Dawid Kuroczko <qnex42(at)gmail(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Russell Smith <mr-russ(at)pws(dot)com(dot)au>, josh(at)agliodbs(dot)com, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Date: 2005-07-08 09:41:23
Message-ID: 758d5e7f0507080241493c9d1d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/7/05, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> One idea would be to just tie its behavior directly to fsync and remove
> the option completely (that was the original TODO), or we can adjust it
> so it doesn't have the same risks as fsync, or the same lack of failure
> reporting as fsync.

I wonder about one thing -- how much impact has the underlying filesystem?
I mean, the problem with "partial writes" to pages is how to handle a situation
when the machine looses power and we are not sure if the write was
completed or not.

But then again, imagine the data is on a filesystem with data journaling
(like ext3 with data=journal). There, to my understanding, the data is
first written into journal prior to be written to disk drive. Assuming the
drive looses power during the process, I guess there would be two
possible situations:
1) the modification was committed to journal completely, so we can replay
the journal and we are sure the 8kb block is fine. (*)
2) the modification in the journal is not complete. It has not been fully
committed to the filesystem journal. And we are safe to assume that
drive has an old data.
(*) I am not sure if it is true for 8kb-blocks, and of course, I haven't got
good knowledge about ext3's journalling and its atomicity...
Assuming above are true, it would be interesting to see how ext3
with data=journal and partial writes competes with ext3 data=someother
without it.

I don't have extensive knowledge with journalling internals, but I thought
I would mention it, so people with wider knowledge could put their
input here.

Regards,
Dawid

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2005-07-08 13:06:53 Re: Must be owner to truncate?
Previous Message Simon Riggs 2005-07-08 09:17:51 Re: Checkpoint cost, looks like it is WAL/CRC