Re: [9.4 bug] The database server hangs with write-heavy workload on Windows

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [9.4 bug] The database server hangs with write-heavy workload on Windows
Date: 2014-10-13 14:56:10
Message-ID: 543BE80A.4020407@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/13/2014 10:47 AM, Heikki Linnakangas wrote:
> On 10/10/2014 05:08 PM, MauMau wrote:
>> From: "Craig Ringer" <craig(at)2ndquadrant(dot)com>
>>> It sounds like they've produced a test case, so they should be able to
>>> with a bit of luck.
>>>
>>> Or even better, send you the test case.
>>
>> I asked the user about this. It sounds like the relevant test case consists
>> of many scripts. He explained to me that the simplified test steps are:
>>
>> 1. initdb
>> 2. pg_ctl start
>> 3. Create 16 tables. Each of those tables consist of around 10 columns.
>> 4. Insert 1000 rows into each of those 16 tables.
>> 5. Launch 16 psql sessions concurrently. Each session updates all 1000 rows
>> of one table, e.g., session 1 updates table 1, session 2 updates table 2,
>> and so on.
>> 6. Repeat step 5 50 times.
>>
>> This sounds a bit complicated, but I understood that the core part is 16
>> concurrent updates, which should lead to contention on xlog insert slots
>> and/or spinlocks.
>
> I was able to reproduce this. I reduced wal_buffers to 64kB, and
> NUM_XLOGINSERT_LOCKS to 4 to increase the probability of the deadlock,
> and ran a test case as above on my laptop for several hours, and it
> finally hung. Will investigate...

Ok, I tracked the bug down to the way LWLockAcquireWithVar,
LWLockRelease, and LWLockWaitForVar work. Here's a simplified model of
how this happens:

Three backends are needed to cause the deadlock. Let's call them A, B
and C. There are two locks, and one of them protects a variable, i.e. is
used with LWLockAcquireWithVar et al. The backends run these operations:

A: (checkpointer does this in xlog.c)
LWLockAcquireWithVar(lock1, value1ptr, 0xFF)
LWLockAcquire(lock2);
LWLockRelease(lock1);
LWLockRelease(lock2);

B:
LWLockAcquire(lock2, LW_EXCLUSIVE)
LWLockWaitForVar(lock1, value1ptr, 0, &newval);
LWLockRelease(lock2);

C:
LWLockAcquireWithVar(lock1, value1ptr, 0);
LWLockRelease(lock1)

So, A acquire both locks, in order lock1, lock2. B acquires lock2, and
then waits for lock1 to become free or have a non-zero value in the
variable. So A and B operate on the locks in opposite order, but this is
not supposed to deadlock, because A sets the variable to non-zero, and B
waits for it to become non-zero. Then there is a third action, C, that
just acquire lock1, with zero value.

This is the sequence that leads to the deadlock:

(both locks are free in the beginning)
C: LWLockAcquireWithVar(lock1, 0). Gets the lock.
A: LWLockAcquireWithVar(lock1, 0xFF). Blocks.
B: LWLockAcuire(lock2). Gets the lock.
B: LWLockWaitForVar(lock 1, 0). Blocks.

C: LWLockRelease(lock1). Wakes up A and B. Sets releaseOK=false because
A is waiting for the lock in exclusive mode.
C: LWLockAcquireWithVar(lock1, 0). Steals the lock back before A or B
have had a chance to run.

B: Wakes up. Observes the lock is still taken, with val 0. Adds itself
back to wait queue and goes back to sleep.
C: Releases lock 1, releaseOK is false because A has not run yet. Does
not wake up anyone.
A: Wakes up. Acquires lock 1 with val 0xFF...
A: Blocks waiting on lock 2.

So the gist of the problem is that LWLockRelease doesn't wake up
LW_WAIT_UNTIL_FREE waiters, when releaseOK == false. It should, because
a LW_WAIT_UNTIL FREE waiter is now free to run if the variable has
changed in value, and it won't steal the lock from the other backend
that's waiting to get the lock in exclusive mode, anyway.

I noticed another potential bug: LWLockAcquireCommon doesn't use a
volatile pointer when it sets the value of the protected variable:

> /* If there's a variable associated with this lock, initialize it */
> if (valptr)
> *valptr = val;
>
> /* We are done updating shared state of the lock itself. */
> SpinLockRelease(&lock->mutex);

If the compiler or CPU decides to reorder those two, so that the
variable is set after releasing the spinlock, things will break.

The attached patch should fix these two bugs. It is for REL9_4_STABLE;
needs to be forward-patched ot master too. This fixes the deadlock for
me. Anyone see any issues with this?

Thanks MauMau for the testing!

- Heikki

Attachment Content-Type Size
releaseok-with-var-1.patch text/x-diff 3.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-10-13 14:59:30 Re: Hide 'Execution time' in EXPLAIN (COSTS OFF)
Previous Message Andres Freund 2014-10-13 14:55:16 Re: [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...