Re: Rework the way multixact truncations work

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Rework the way multixact truncations work
Date: 2015-11-08 08:54:07
Message-ID: 20151108085407.GA1097830@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 29, 2015 at 08:46:52AM +0100, Andres Freund wrote:
> On October 29, 2015 7:59:03 AM GMT+01:00, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >That helps; thanks. Your design seems good. I've located only insipid
> >defects.
>
> Great!
>
> > I propose to save some time by writing a patch series
> >eliminating
> >them, which you could hopefully review. Does that sound good?
>
> Yes, it does.

I have pushed a stack of branches to https://github.com/nmisch/postgresql.git:

mxt0-revert - reverts commits 4f627f8 and aa29c1c
mxt1-disk-independent - see below
mxt2-cosmetic - update already-wrong comments and formatting
mxt3-main - replaces commit 4f627f8
mxt4-rm-legacy - replaces commit aa29c1c

The plan is to squash each branch into one PostgreSQL commit. In addition to
examining overall "git diff mxt2-cosmetic mxt3-main", I recommend reviewing
itemized changes and commit log entries in "git log -p --reverse --no-merges
mxt2-cosmetic..mxt3-main". In particular, when a change involves something
you discussed upthread or was otherwise not obvious, I put a statement of
rationale in the commit log.

> + * Make sure to only attempt truncation if there's values to truncate
> + * away. In normal processing values shouldn't go backwards, but there's
> + * some corner cases (due to bugs) where that's possible.

Which bugs are those? I would like to include more detail if available.

If anything here requires careful study, it's the small mxt1-disk-independent
change, which I have also attached in patch form. That patch removes the
SlruScanDirCbFindEarliest() test from TruncateMultiXact(), which in turn makes
multixact control decisions independent of whether TruncateMultiXact() is
successful at unlinking segments. Today, one undeletable segment file can
cause TruncateMultiXact() to give up on truncation completely for a span of
hundreds of millions of MultiXactId. Patched multixact.c will, like CLOG,
make its decisions strictly based on the content of files it expects to exist.
It will no longer depend on the absence of files it hopes will not exist.

To aid in explaining the change's effects, I will define some terms. A
"logical wrap" occurs when no range of 2^31 integers covers the set of
MultiXactId stored in xmax fields. A "file-level wrap" occurs when there
exists a pair of pg_multixact/offsets segment files such that:

| segno_a * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE -
segno_b * SLRU_PAGES_PER_SEGMENT * MULTIXACT_OFFSETS_PER_PAGE | > 2^31

A logical wrap implies either a file-level wrap or missing visibility
metadata, but a file-level wrap does not imply other consequences. The
SlruScanDirCbFindEarliest() test is usually redundant with
find_multixact_start(), because MultiXactIdPrecedes(oldestMXact, earliest)
almost implies that find_multixact_start() will fail. The exception arises
when pg_multixact/offsets files compose a file-level wrap, which can happen
when TruncateMultiXact() fails to unlink segments as planned. When it does
happen, the result of SlruScanDirCbFindEarliest(), and therefore the computed
"earliest" value, is undefined. (This outcome is connected to our requirement
to use only half the pg_clog or pg_multixact/offsets address space at any one
time. The PagePrecedes callbacks for these SLRUs cease to be transitive if
more than half their address space is in use.)

The SlruScanDirCbFindEarliest() test can be helpful when a file-level wrap
coexists with incorrect oldestMultiXactId (extant xmax values define "correct
oldestMultiXactId"). If we're lucky with readdir() order, the test will block
truncation so we don't delete still-needed segments. I am content to lose
that, because (a) the code isn't reliable for (or even directed toward) that
purpose and (b) sites running on today's latest point releases no longer have
incorrect oldestMultiXactId.

Attachment Content-Type Size
mxact-trunc-disk-independent-v1.patch text/plain 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Artur Zakirov 2015-11-08 11:23:30 Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Previous Message Jörg Strebel 2015-11-08 08:39:50