Re: What is the relationship between checkpoint and wal

From: 高健 <luckyjackgao(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: What is the relationship between checkpoint and wal
Date: 2013-08-28 01:54:55
Message-ID: CAL454F1gPTSVpy63uaTdA4KRZ33+Xgnj-0g3UvWzhM7ZZi6ixQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello

Thanks for replying.

It is really a complicated concept.
So I think that in a mission critical environment , it is not a good choice
to turn full_page_writes on.

Best Regards

2013/8/27 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>

> On Sun, Aug 25, 2013 at 7:57 PM, 高健 <luckyjackgao(at)gmail(dot)com> wrote:
> > Hi :
> >
> > Thanks to Alvaro! Sorry for replying lately.
> >
> > I have understood a little about it.
> >
> > But the description of full_page_write made me even confused. Sorry that
> > maybe I go to another problem:
> >
> > It is said:
> >
> http://www.postgresql.org/docs/9.2/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES
> > ----------
> > When this parameter is on, the PostgreSQL server writes the entire
> content
> > of each disk page to WAL during the first modification of that page
> after a
> > checkpoint. This is needed because a page write that is in process
> during an
> > operating system crash might be only partially completed, leading to an
> > on-disk page that contains a mix of old and new data.
> > -------
> >
> > Let me imagine that:
> > On a disk page, there are following data:
> >
> > id=1 val=1 with transaction id of 1001
> > id=2 val=2 with transaction id of 1002
> > id=3 val=3 with transaction id of 1003
> >
> > If I start DB,
> > And begin with transaction id of 2002 deal with data of id=2 ,making val
> to
> > 20
> > Then with trsansaction id of 2003 deal with data of id=3,making val to 30
> >
> > If With full_page_write =off,
> > When my checkpoint occur, it succeed with transaction 2002 but failed
> with
> > 2003 because of crash.
>
> A checkpoint either succeeds or fails. It cannot succeed with some
> transactions and fail with others.
>
> > Then disk page will be of:
> >
> > id=1 val=1 with transaction id of 1001------maybe this is the very old
> data
> > id=2 val=20 with transaction id of 2002------This is now new data
> > id=3 val=3 with transaction id of 1003------This is old data.
>
>
> Postgres does not do in-place updates, it marks the old row as
> obsolete and creates a new one.
>
> id=1 val= 1 with transaction id of 1001
> id=2 val= 2 with transaction id of 1002 xmax of 2002
> id=3 val= 3 with transaction id of 1003 xmax of 2003
> id=2 val=20 with transaction id of 2002
> id=3 val=30 with transaction id of 2003
>
> Of course the whole point of a torn page write is that you don't how
> much got written, so you don't know what is actually on the disk.
>
> > When DB restart from crash,
> > I think that there are wal data of transaction id 2002 and 2003 beause
> > that wal written to wal_buffer is before data written to shared_buffer.
>
> The wal is written out of wal_buffer and flushed, before the
> corresponding block is written out of shared_buffer.
>
> > So if Online wal log file is ok, there will be no data lost, and
> > roll-forward and roll-back can happen.
> > If some online wal log file is dmaged during crash:
> > There might be some data lost,but if we have archive log, we can restore
> > back due to archive wal log's latest transaction id.
>
> Most WAL records would have no problem being applied to a block that
> is an otherwise uncorrupted mix of old and new.
>
> But some WAL records have instructions that amount to "grab 134 bytes
> from offset 7134 in the block and move them to offset 1623". If the
> block is an unknown mix of old and new data, that can't be carried out
> safely.
>
> Cheers,
>
> Jeff
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2013-08-28 02:50:06 Re: pg_extension_config_dump() with a sequence
Previous Message 高健 2013-08-28 01:49:54 Re: Is there any method to limit resource usage in PG?