Skip site navigation (1) Skip section navigation (2)

Re: Wierd context-switching issue on Xeon

From: Paul Tuckfield <paul(at)tuckfield(dot)com>
To: pg(at)fastcrypt(dot)com
Cc: Anjan Dave <adave(at)vantage(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>,Neil Conway <neilc(at)samurai(dot)com>, Dirk Lutzeb├Ąck <lutzeb(at)aeccom(dot)com>,pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Wierd context-switching issue on Xeon
Date: 2004-04-21 18:19:46
Message-ID: 78427208-93C0-11D8-BA67-000393BD6C3E@tuckfield.com (view raw or flat)
Thread:
Lists: pgsql-performance
Dave:

Why would test and set increase context swtches:
Note that it *does not increase* context swtiches when the two threads 
are on the two cores of a single Xeon processor. (use taskset to force 
affinity on linux)

Scenario:
If the two test and set processes are testing and setting the same bit 
as each other, then they'll see worst case cache coherency misses.  
They'll ping a cache line back and forth between CPUs.  Another case, 
might be that they're tesing and setting different bits or words, but 
those bits or words are always in the same cache line, again causing 
worst case cache coherency and misses.  The fact that tis doesn't 
happen when the threads are bound to the 2 cores of a single Xeon 
suggests it's because they're now sharing L1 cache. No pings/bounces.


I wonder do the threads stall so badly when pinging cache lines back 
and forth,  that the kernel sees it as an opportunity to put the 
process to sleep? or do these worst case misses cause an interrupt?

My question is:  What is it that the two threads waiting for when they 
spin? Is it exactly the same resource, or two resources that happen to 
have test-and-set flags in the same cache line?

On Apr 20, 2004, at 7:41 PM, Dave Cramer wrote:

> I modified the code in s_lock.c to remove the spins
>
> #define SPINS_PER_DELAY         1
>
> and it doesn't exhibit the behaviour
>
> This effectively changes the code to
>
>
> while(TAS(lock))
> 	select(10000); // 10ms
>
> Can anyone explain why executing TAS 100 times would increase context
> switches ?
>
> Dave
>
>
> On Tue, 2004-04-20 at 12:59, Josh Berkus wrote:
>> Anjan,
>>
>>> Quad 2.0GHz XEON with highest load we have seen on the applications, 
>>> DB
>>> performing great -
>>
>> Can you run Tom's test?   It takes a particular pattern of data 
>> access to
>> reproduce the issue.
> -- 
> Dave Cramer
> 519 939 0336
> ICQ # 14675561
>
>
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 8: explain analyze is your friend
>


In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2004-04-21 18:20:25
Subject: Re: slow seqscan
Previous:From: Josh BerkusDate: 2004-04-21 17:47:03
Subject: Re: [PERFORM] MySQL vs PG TPC-H benchmarks

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group