Re: Lack of PageSetLSN in heap_xlog_visible

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Subject: Re: Lack of PageSetLSN in heap_xlog_visible
Date: 2022-11-11 01:20:35
Message-ID: 0c37b80e62b1f3007d5a6d1292bd8fa0c275627a.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 2022-10-13 at 12:49 -0700, Jeff Davis wrote:

> It may violate our torn page protections for checksums, as well...

I could not reproduce a problem here, but I believe one exists when
checksums are enabled, because it bypasses the protections of
UpdateMinRecoveryPoint(). By not updating the page's LSN, it would
allow the page to be flushed (and torn) without updating the minimum
recovery point. That would allow the administrator to subsequently do a
PITR or standby promotion while there's a torn page on disk.

I'm considering this a theoretical risk because for a page tear to be
consequential in this case, it would need to happen between the
checksum and the flags; and I doubt that's a real possibility. But
until we formalize that declaration, then this problem should be fixed.

Patch attached. I also fixed:

* The comment in heap_xlog_visible() says that not updating the page
LSN avoids full page writes; but the causation is backwards: avoiding
full page images requires that we don't update the page's LSN.
* Also in heap_xlog_visible(), there are comments and a branch
leftover from before commit f8f4227976 which don't seem to be necessary
any more.
* In visibilitymap_set(), I clarified that we *must* not set the page
LSN of the heap page if no full page image was taken.

> > But it still not clear for me that not bumping LSN in this place is
> > correct if wal_log_hints is set.
> > In this case we will have VM page with larger LSN than heap page,
> > because visibilitymap_set
> > bumps LSN of VM page. It means that in theory after recovery we may
> > have
> > page marked as all-visible in VM,
> > but not having PD_ALL_VISIBLE  in page header. And it violates VM
> > constraint:
>
> I'm not quite following this scenario. If the heap page has a lower
> LSN
> than the VM page, how could we recover to a point where the VM bit is
> set but the heap flag isn't?

I still don't understand this problem scenario.

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachment Content-Type Size
v1-0001-Fix-theoretical-torn-page-hazard.patch text/x-patch 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-11-11 02:26:33 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Peter Geoghegan 2022-11-11 00:48:17 Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans