Re: Removing PD_ALL_VISIBLE

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Removing PD_ALL_VISIBLE
Date: 2013-01-17 09:55:07
Message-ID: CABOikdPHOPZ5VV1dHFu+WczFX2esvPrRzTvMXNJyy2AM1Bz83w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 17, 2013 at 2:57 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>wrote:

>
>
> There was considerable discussion after this (accessible through the
> archives link above), which I won't attempt to summarise.
>
>
I thought Robert made those comments after considerable discussions on
Jeff's approach. So he probably still stands by his objections or at least
not satisfied/seen the numbers.

Now that I look at the patch, I wonder if there is another fundamental
issue with the patch. Since the patch removes WAL logging for the VM set
operation, this can happen:

1. Vacuum kicks in and clears all dead tuples in a page and decides that
its all-visible
2. Vacuum WAL-logs the cleanup activity and marks the page dirty
3. Vacuum sets the visibility bit and marks the VM page dirty
4. Say the VM page gets written to the disk. The heap page is not yet
written neither the WAL log corresponding to the cleanup operation
5. CRASH

After recovery, the VM bit will remain set because the VM page got written
before the crash. But since heap page's cleanup WAL did not made to the
disk, those operations won't be replayed. The heap page will be left with
not-all-visible tuples in that case and its not a good state to be in.

The original code does not have this problem because the VM set WAL gets
written after the heap page cleanup WAL. So its guaranteed that the VM bit
will be set during recovery only if the cleanup WAL is replayed too (there
is more magic than what meets the eye and I think its not fully
documented).

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2013-01-17 10:09:19 Re: review: pgbench - aggregation of info written into log
Previous Message Magnus Hagander 2013-01-17 09:36:47 Re: review: pgbench - aggregation of info written into log