Re: Multixact slru doesn't don't force WAL flushes in SlruPhysicalWritePage()

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: Multixact slru doesn't don't force WAL flushes in SlruPhysicalWritePage()
Date: 2015-11-11 04:22:47
Message-ID: 20151111042247.GA1212824@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 09, 2015 at 10:40:07PM +0100, Andres Freund wrote:
> /*
> * Optional array of WAL flush LSNs associated with entries in the SLRU
> * pages. If not zero/NULL, we must flush WAL before writing pages (true
> * for pg_clog, false for multixact, pg_subtrans, pg_notify). group_lsn[]
> * has lsn_groups_per_page entries per buffer slot, each containing the
> * highest LSN known for a contiguous group of SLRU entries on that slot's
> * page.
> */
> XLogRecPtr *group_lsn;
> int lsn_groups_per_page;
>
> Uhm. multixacts historically didn't need to follow the
> write-WAL-before-data rule because it was zapped at restart. But it's
> now persistent.
>
> There are no comments about this choice anywhere in multixact.c, leading
> me to believe that this was not an intentional decision.

Here's the multixact.c comment justifying it:

* XLOG interactions: this module generates an XLOG record whenever a new
* OFFSETs or MEMBERs page is initialized to zeroes, as well as an XLOG record
* whenever a new MultiXactId is defined. This allows us to completely
* rebuild the data entered since the last checkpoint during XLOG replay.
* Because this is possible, we need not follow the normal rule of
* "write WAL before data"; the only correctness guarantee needed is that
* we flush and sync all dirty OFFSETs and MEMBERs pages to disk before a
* checkpoint is considered complete. If a page does make it to disk ahead
* of corresponding WAL records, it will be forcibly zeroed before use anyway.
* Therefore, we don't need to mark our pages with LSN information; we have
* enough synchronization already.

The comment's justification is incomplete, though. What of pages filled over
the course of multiple checkpoint cycles?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2015-11-11 05:37:43 Proposal: "Causal reads" mode for load balancing reads without stale data
Previous Message Tom Lane 2015-11-11 03:12:12 Re: Error in char(n) example in documentation