vacuum vs heap_update_tuple() and multixactids

From: Andres Freund <andres(at)anarazel(dot)de>
To: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: vacuum vs heap_update_tuple() and multixactids
Date: 2017-12-19 18:31:14
Message-ID: 20171219183114.q3utztzbqih34ltl@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

In [1] I'd discovered a only mildly related bug while reading code to
make sure my fix [2] et al was correct.

Quoting a couple messages by myself:
> Staring at the vacuumlazy hunk I think I might have found a related bug:
> heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> a multixact and still running. It does so without verifying liveliness
> of members. Isn't that buggy? Consider what happens if we have three
> blocks: 1 has free space, two is being vacuumed and is locked, three is
> full and has a tuple that's key share locked by a live tuple and is
> updated by a dead xmax from before the xmin horizon. In that case afaict
> the multi will be copied from the third page to the first one. Which is
> quite bad, because vacuum already processed it, and we'll set
> relfrozenxid accordingly. I hope I'm missing something here?
>
> Trying to write a testcase for that now.
>
> This indeed happens, but I can't quite figure out a way to write an
> isolationtester test for this. The problem is that to have something
> reproducible one has to make vacuum block on a cleanup lock, but that
> currently doesn't register as waiting for the purpose of
> isolationtester's logic.

So what basically happens is that vacuum might be at block X, and a
concurrent update will move a tuple from a block > X to a block < X,
preserving the multixactid in xmax. Which can mean there later is a
multixactid in the table that's from before relminmxid.

I manually reproduced the issue, but it's pretty painful to do so
manually. I've not found any way to reliably do so in isolationtester so
far. Cleanup locks make it possible to schedule this without race
conditions, but isolationtester currently won't switch sessions when
blocked on a cleanup lock.

Could I perhaps convince somebody to add that as a feature to
isolationtester? I'm willing to work on a bugfix for the bug itself, but
I've already spent tremendous amounts of time, energy and pain on
multixact bugs, and I'm at the moment feeling a bit unenthusiastic about
also working on test infrastructure for it... If somebody else is
willing to work on both infrastructure *and* a bugfix, that's obviously
even better ;)

I think the bugfix is going to have to essentially be something similar
to FreezeMultiXactId(). I.e. when reusing an old tuple's xmax for a new
tuple version, we need to prune dead multixact members. I think we can
do so unconditionally and rely on multixact id caching layer to avoid
unnecesarily creating multis when all members are the same.

Greetings,

Andres Freund

[1] http://archives.postgresql.org/message-id/20171103145330.5ycjoje5s6lfwxps%40alap3.anarazel.de
[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=9c2f0a6c3cc8bb85b78191579760dbe9fb7814ec

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2017-12-19 18:35:12 Re: vacuum vs heap_update_tuple() and multixactids
Previous Message Skarsol 2017-12-19 16:52:09 Re: BUG #14891: Old cancel request presented by pgbouncer honored after skipping a query.

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-12-19 18:35:12 Re: vacuum vs heap_update_tuple() and multixactids
Previous Message Robert Haas 2017-12-19 18:29:31 Re: explain analyze output with parallel workers - question about meaning of information for explain.depesz.com