Skip site navigation (1) Skip section navigation (2)

Re: crash-safe visibility map, take four

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: 高增琦 <pgf00a(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jesper Krogh <jesper(at)krogh(dot)cc>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: crash-safe visibility map, take four
Date: 2011-03-31 10:31:24
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On 31.03.2011 11:33, 高增琦 wrote:
> Consider a example:
> 1. delete on two pages, emits two log (1, page1, vm_clear_1), (2, page2,
> vm_clear_2)
> 2. "vm_clear_1" and "vm_clear_2" on same vm page
> 3. checkpoint, and vm page get torned, vm_clear_2 was lost
> 4. delete another page, emits one log (3, page1, vm_clear_3), vm_clear_3
> still on that vm page
> 5. power down
> 6. startup, redo will replay all change after checkpoint, but vm_clear_2
> will never be cleared
> Am I right?

No. A page can only be torn at a hard crash, ie. at step 5. A checkpoint 
flushes all changes to disk, once the checkpoint finishes all the 
changes before it are safe on disk.

If you crashed between step 2 and 3, the VM page might be torn so that 
only one of the vm_clears has made it to disk but the other has not. But 
the WAL records for both are on disk anyway, so that will be corrected 
at replay.

>>   Another question:
>>> To address the problem in
>>> , should we just clear the vm before the log of insert/update/delete?
>>> This may reduce the performance, is there another solution?
>> Yeah, that's a straightforward way to fix it. I don't think the performance
>> hit will be too bad. But we need to be careful not to hold locks while doing
>> I/O, which might require some rearrangement of the code. We might want to do
>> a similar dance that we do in vacuum, and call visibilitymap_pin first, then
>> lock and update the heap page, and then set the VM bit while holding the
>> lock on the heap page.
> Do you mean we should lock the heap page first, then get the blocknumber,
> then release heap page,
> then pin the vm's page, then lock both heap page and vm page?
> As Robert Haas said, when lock the heap page again, may there isnot enough
> free space on it.

I think the sequence would have to be:

1. Pin the heap page.
2. Check if the all-visible flag is set on the heap page (without lock). 
If it is, pin the vm page
3. Lock heap page, check that it has enough free space
4. Check again if the all-visible flag is set. If it is but we didn't 
pin the vm page yet, release lock and loop back to step 2
5. Update heap page
6. Update vm page

> Is there a way just stop the checkpoint for a while?

Not at the moment. It wouldn't be hard to add, though. I was about to 
add a mechnism for that last autumn to fix a similar issue with b-tree 
parent pointer updates 
but in the end it was solved differently.

   Heikki Linnakangas

In response to

pgsql-hackers by date

Next:From: Heikki LinnakangasDate: 2011-03-31 10:41:46
Subject: Re: SHMEM_INDEX_SIZE exceeded on startup
Previous:From: Noah MischDate: 2011-03-31 10:06:49
Subject: Re: BUG #5856: pg_attribute.attinhcount is not correct.

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group