Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request)

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: odo(at)odoo(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request)
Date: 2015-12-17 18:31:53
Message-ID: 20151217183153.GO2618@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

odo(at)odoo(dot)com wrote:

> This is a back-patch request of 05315498012530d44cd89a209242a243374e274d to
> 9.3 and 9.4.
>
> As discussed in the -general list[1], both 9.3 and 9.4 show spurious
> serialization failures when faced with the use case included below.
>
> In 9.2, T2 used to block until T1's commit, but then continued without
> error, and in 9.5 both T1 and T2 proceed without blocking nor error.
>
> Kevin Grittner located[2] the root cause as a regression that was fixed by
> Álvaro at 0531549 [3].
>
> For what it's worth, our system uses many long-running transactions
> (background jobs, batch data imports, etc.) that are frequently interrupted
> and rolled back by micro-transactions coming from users who just happen to
> update minor data on their records (such as their last login date). So this
> bug appears to cause more than just a performance regression.

I would like to understand why does that patch fix the problem -- maybe
it's spurious and the real reason is something different. The commit
message states:
Commit 0ac5ad5134f2 removed an optimization in multixact.c that skipped
fetching members of MultiXactId that were older than our
OldestVisibleMXactId value. The reason this was removed is that it is
possible for multixacts that contain updates to be older than that
value. However, if the caller is certain that the multi does not
contain an update (because the infomask bits say so), it can pass this
info down to GetMultiXactIdMembers, enabling it to use the old
optimization.

In your test case,

> T1 T2
> |-----------------------------|----------------------------------|
> BEGIN ISOLATION LEVEL
> REPEATABLE READ;
>
> UPDATE orders
> SET name = 'order of foo',
> user_id = 1
> WHERE id = 1;
>
> BEGIN ISOLATION LEVEL
> REPEATABLE READ;
>
> UPDATE users
> SET date = now()
> WHERE id = 1;
>
> COMMIT;
>
> UPDATE orders
> SET name = 'order of foo (2)',
> user_id = 1
> WHERE id = 1;

we have a transaction that takes a lock-only multi in
table users, and then when we do the second update we don't look it up
because ...?? And then this causes the test case not to fail because
..?

I would like to understand the mechanism for this fix before declaring
that the fix is correct.

The patch doesn't apply cleanly because of other changes in the area, so
I attach the backpatched version here, as well as a test case for
isolationtester in a separate commit (with which it's easy to confirm
that the problem does exist and that the patch indeed fixes it.)

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-Avoid-uselessly-looking-up-old-LOCK_ONLY-multixacts.patch text/x-diff 12.4 KB
0002-Add-isolation-spec-for-13681.patch text/x-diff 2.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kevin Grittner 2015-12-17 19:34:42 Re: BUG #13681: Serialization failures caused by new multixact code of 9.3 (back-patch request)
Previous Message Tom Lane 2015-12-17 17:36:33 Re: BUG #13666: REASSIGN OWNED BY doesn't affect the relation underlying composite type