Re: [9.4 bug] The database server hangs with write-heavy workload on Windows

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [9.4 bug] The database server hangs with write-heavy workload on Windows
Date: 2014-10-10 14:11:27
Message-ID: 20141010141127.GE6670@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-10 23:08:34 +0900, MauMau wrote:
> From: "Craig Ringer" <craig(at)2ndquadrant(dot)com>
> >It sounds like they've produced a test case, so they should be able to
> >with a bit of luck.
> >
> >Or even better, send you the test case.
>
> I asked the user about this. It sounds like the relevant test case consists
> of many scripts. He explained to me that the simplified test steps are:
>
> 1. initdb
> 2. pg_ctl start
> 3. Create 16 tables. Each of those tables consist of around 10 columns.
> 4. Insert 1000 rows into each of those 16 tables.
> 5. Launch 16 psql sessions concurrently. Each session updates all 1000 rows
> of one table, e.g., session 1 updates table 1, session 2 updates table 2,
> and so on.
> 6. Repeat step 5 50 times.
>
> This sounds a bit complicated, but I understood that the core part is 16
> concurrent updates, which should lead to contention on xlog insert slots
> and/or spinlocks.

Hm. I've run similar loads on linux for long enough that I'm relatively
sure I'd have seen this.

Could you get them to print out the content's of the lwlock all these
processes are waiting for?

> >Your next step here really needs to be to make this reproducible against
> >a debug build. Then see if reverting the xlog scalability work actually
> >changes the behaviour, given that you hypothesised that it could be
> >involved.

I don't think you can trivially revert the xlog scalability stuff.

> Thank you, but that may be labor-intensive and time-consuming. In addition,
> the user uses a machine with multiple CPU cores, while I only have a desktop
> PC with two CPU cores. So I doubt I can reproduce the problem on my PC.

Well, it'll also be labor intensive for the community to debug.

> I asked the user to change S_UNLOCK to something like the following and run
> the test during this weekend (the next Monday is a national holiday in
> Japan).
>
> #define S_UNLOCK(lock) InterlockedExchange(lock, 0)

That shouldn't be required. For one, on 9.4 (not 9.5!) spinlock releases
only need to prevent reordering on the CPU level. As x86 is a TSO
architecture (total store order) that doesn't require doing anything
special. And even if it'd require more, on msvc volatile reads/stores
act as acquire/release fences unless you monkey with the compiler settings.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig James 2014-10-10 14:21:05 Re: Yet another abort-early plan disaster on 9.3
Previous Message MauMau 2014-10-10 14:08:34 Re: [9.4 bug] The database server hangs with write-heavy workload on Windows