Re: Wierd context-switching issue on Xeon

From: Paul Tuckfield <paul(at)tuckfield(dot)com>
To: pg(at)fastcrypt(dot)com
Cc: Anjan Dave <adave(at)vantage(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>, pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Wierd context-switching issue on Xeon
Date: 2004-04-21 18:19:46
Message-ID: 78427208-93C0-11D8-BA67-000393BD6C3E@tuckfield.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Dave:

Why would test and set increase context swtches:
Note that it *does not increase* context swtiches when the two threads
are on the two cores of a single Xeon processor. (use taskset to force
affinity on linux)

Scenario:
If the two test and set processes are testing and setting the same bit
as each other, then they'll see worst case cache coherency misses.
They'll ping a cache line back and forth between CPUs. Another case,
might be that they're tesing and setting different bits or words, but
those bits or words are always in the same cache line, again causing
worst case cache coherency and misses. The fact that tis doesn't
happen when the threads are bound to the 2 cores of a single Xeon
suggests it's because they're now sharing L1 cache. No pings/bounces.

I wonder do the threads stall so badly when pinging cache lines back
and forth, that the kernel sees it as an opportunity to put the
process to sleep? or do these worst case misses cause an interrupt?

My question is: What is it that the two threads waiting for when they
spin? Is it exactly the same resource, or two resources that happen to
have test-and-set flags in the same cache line?

On Apr 20, 2004, at 7:41 PM, Dave Cramer wrote:

> I modified the code in s_lock.c to remove the spins
>
> #define SPINS_PER_DELAY 1
>
> and it doesn't exhibit the behaviour
>
> This effectively changes the code to
>
>
> while(TAS(lock))
> select(10000); // 10ms
>
> Can anyone explain why executing TAS 100 times would increase context
> switches ?
>
> Dave
>
>
> On Tue, 2004-04-20 at 12:59, Josh Berkus wrote:
>> Anjan,
>>
>>> Quad 2.0GHz XEON with highest load we have seen on the applications,
>>> DB
>>> performing great -
>>
>> Can you run Tom's test? It takes a particular pattern of data
>> access to
>> reproduce the issue.
> --
> Dave Cramer
> 519 939 0336
> ICQ # 14675561
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2004-04-21 18:20:25 Re: slow seqscan
Previous Message Josh Berkus 2004-04-21 17:47:03 Re: [PERFORM] MySQL vs PG TPC-H benchmarks