Re: Wierd context-switching issue on Xeon

From: Dave Cramer <pg(at)fastcrypt(dot)com>
To: Paul Tuckfield <paul(at)tuckfield(dot)com>
Cc: Anjan Dave <adave(at)vantage(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Dirk Lutzebäck <lutzeb(at)aeccom(dot)com>, pgsql-performance(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Wierd context-switching issue on Xeon
Date: 2004-04-21 19:13:28
Message-ID: 1082574808.1558.243.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

FYI,

I am doing my testing on non hyperthreading dual athlons.

Also, the test and set is attempting to set the same resource, and not
simply a bit. It's really an lock;xchg in assemblelr.

Also we are using the PAUSE mnemonic, so we should not be seeing any
cache coherency issues, as the cache is being taken out of the picture
AFAICS ?

Dave

On Wed, 2004-04-21 at 14:19, Paul Tuckfield wrote:
> Dave:
>
> Why would test and set increase context swtches:
> Note that it *does not increase* context swtiches when the two threads
> are on the two cores of a single Xeon processor. (use taskset to force
> affinity on linux)
>
> Scenario:
> If the two test and set processes are testing and setting the same bit
> as each other, then they'll see worst case cache coherency misses.
> They'll ping a cache line back and forth between CPUs. Another case,
> might be that they're tesing and setting different bits or words, but
> those bits or words are always in the same cache line, again causing
> worst case cache coherency and misses. The fact that tis doesn't
> happen when the threads are bound to the 2 cores of a single Xeon
> suggests it's because they're now sharing L1 cache. No pings/bounces.
>
>
> I wonder do the threads stall so badly when pinging cache lines back
> and forth, that the kernel sees it as an opportunity to put the
> process to sleep? or do these worst case misses cause an interrupt?
>
> My question is: What is it that the two threads waiting for when they
> spin? Is it exactly the same resource, or two resources that happen to
> have test-and-set flags in the same cache line?
>
> On Apr 20, 2004, at 7:41 PM, Dave Cramer wrote:
>
> > I modified the code in s_lock.c to remove the spins
> >
> > #define SPINS_PER_DELAY 1
> >
> > and it doesn't exhibit the behaviour
> >
> > This effectively changes the code to
> >
> >
> > while(TAS(lock))
> > select(10000); // 10ms
> >
> > Can anyone explain why executing TAS 100 times would increase context
> > switches ?
> >
> > Dave
> >
> >
> > On Tue, 2004-04-20 at 12:59, Josh Berkus wrote:
> >> Anjan,
> >>
> >>> Quad 2.0GHz XEON with highest load we have seen on the applications,
> >>> DB
> >>> performing great -
> >>
> >> Can you run Tom's test? It takes a particular pattern of data
> >> access to
> >> reproduce the issue.
> > --
> > Dave Cramer
> > 519 939 0336
> > ICQ # 14675561
> >
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 8: explain analyze is your friend
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
>
>
>
> !DSPAM:4086c4d0263544680737483!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jan Wieck 2004-04-21 19:38:47 Re: [PERFORM] MySQL vs PG TPC-H benchmarks
Previous Message Tom Lane 2004-04-21 19:04:50 Re: Help understanding stat tables