Re: What is the relationship between checkpoint and wal

From: 高健 <luckyjackgao(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: What is the relationship between checkpoint and wal
Date: 2013-08-26 02:57:43
Message-ID: CAL454F1cAifz90GYkpsRLRCmBAYwiX-XL=0oJGDHeO_51VDd7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi :

Thanks to Alvaro! Sorry for replying lately.

I have understood a little about it.

But the description of full_page_write made me even confused. Sorry that
maybe I go to another problem:

It is said:
http://www.postgresql.org/docs/9.2/static/runtime-config-wal.html#GUC-FULL-PAGE-WRITES
----------
When this parameter is on, the PostgreSQL server writes the entire content
of each disk page to WAL during the first modification of that page after a
checkpoint. This is needed because a page write that is in process during
an operating system crash might be only partially completed, leading to an
on-disk page that contains a mix of old and new data.
-------

Let me imagine that:
On a disk page, there are following data:

id=1 val=1 with transaction id of 1001
id=2 val=2 with transaction id of 1002
id=3 val=3 with transaction id of 1003

If I start DB,
And begin with transaction id of 2002 deal with data of id=2 ,making val to
20
Then with trsansaction id of 2003 deal with data of id=3,making val to 30

If With full_page_write =off,
When my checkpoint occur, it succeed with transaction 2002 but failed with
2003 because of crash.
Then disk page will be of:

id=1 val=1 with transaction id of 1001------maybe this is the very old data
id=2 val=20 with transaction id of 2002------This is now new data
id=3 val=3 with transaction id of 1003------This is old data.

When DB restart from crash,
I think that there are wal data of transaction id 2002 and 2003 beause
that wal written to wal_buffer is before data written to shared_buffer.
So if Online wal log file is ok, there will be no data lost, and
roll-forward and roll-back can happen.
If some online wal log file is dmaged during crash:
There might be some data lost,but if we have archive log, we can restore
back due to archive wal log's latest transaction id.

If With full_page_write =on,
When my checkpoint occur, it succeed with transaction 2002,then no matter
transaction 2003's data written is ok or failed during checkpoint, the
transaction 2003's data will be written to wal file.
Then disk page will be of:

id=1 val=1 with transaction id of 1001------maybe this is the very old data
id=2 val=2 with transaction id of 2002------This is now new data
id=3 val=3 with transaction id of 1003------This is old data.

But in WAL there are following image:

id=1 val=1 with transaction id of 1001------maybe this is the very old data
id=2 val=20 with transaction id of 2002------This is now new data
id=3 val=30 with transaction id of 2003------This is old data.

I think that maybe it is because of this that data lost is less than
full_page_writes = off.
But why "writes the entire content of each disk page to WAL "?

the id=1 val=1 data is "very old", and even not in read into memory.
Why it should be from disk-->memory-->wal by wal writer?

maybe I have many mis-understanding about it. Thanks for replying!

Best Regards

2013/8/23 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>

> 高健 escribió:
>
> > ...
> > Checkpoints are fairly expensive, first because they require writing out
> > all currently dirty buffers, and second because they result in extra
> > subsequent WAL traffic as discussed above.
> > ...
> >
> > What confused me is that: (checkpoint)result in extra subsequent WAL
> > traffic as discussed above...
> >
> > I haven't found any more information can describe it in the "above" of
> that
> > page.
>
> It means that the first change following the checkpoint that affects any
> particular page will require a full page image of that page to be
> written to WAL. See the discussion in the preceding paragraph about
> "full_page_writes". It's not the checkpoint itself that writes this
> extra WAL, but the rest of the system.
>
> --
> Álvaro Herrera http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message 高健 2013-08-26 06:08:13 Is there any method to limit resource usage in PG?
Previous Message Merlin Moncure 2013-08-26 00:49:15 Re: batch insertion