Re: [HACKERS] S_LOCK reduced contention

From: dg(at)illustra(dot)com (David Gould)
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane)
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] S_LOCK reduced contention
Date: 1998-06-15 22:27:26
Message-ID: 9806152227.AA09687@hawk.illustra.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane writes:
> dg(at)illustra(dot)com (David Gould) writes:
> > [ Much careful testing snipped ]
> > In some of the test cases there was significant timing variation from
> > run to run even though the test conditions were apparently identical.
> > Even more strangely, the variation in time was not random but appeared
> > to represent two different modes. And, the variation was itself
> > repeatable.
> > [snip]
> > I have no explanation for this variation. Possibly it is some interaction
> > of where the program is loaded and the state of the memory heirarchy, but
> > even this is hard to sustain. I would be very curious to hear of any
> > plausible explainations.
>
> After chewing on this for a while I think that your speculation is
> right. You were using a Pentium, you said. The Pentium has a two-way
> set associative cache, which means that any given main-memory address
> has exactly two cache lines it could be loaded into. Main-memory
> addresses that are 1000H apart contend for the same pair of cache lines.
> Thus, if your program happens to repeatedly hit three locations that are
> exactly 1000H apart, it will suffer a cache miss every time. Change the
> address spacing, and no miss occurs. The cache miss takes forty-some
> clock cycles, versus one if the target location is in cache.
> (BTW, I'm getting this info out of Rick Booth's "Inner Loops", a fine
> reference book if you are into hand-optimized assembly coding for Intel
> processors.)
>
> So what I think you were seeing is that on some runs, the loop involved
> hitting three addresses that contend for the same cache line pair, while
> on other runs there was no cache contention. This could be explained
> by varying load addresses for the program, if it touched both its own
> addresses (variable) and some non-varying addresses --- say, C library
> routines executed from a shared library that remained loaded throughout.
> If you can link with a non-shared C library then you should get more
> consistent runtimes, because the offsets between all the locations
> touched by your loop would be fixed, and thus cache hits or misses ought
> to be consistent from run to run.

This is in line with my own speculation. However, I am not convinced.

First, the test loop and the function it calls are only about 100 bytes
total taken together. And, no system calls or library calls are made during
the test. This tends to rule out "locations 1000H apart" and shared library
effects. Also, I would expect the system to load programs and libraries
on VM page boundaries. Unless there is some cachability difference from one
page to the next, I am at a loss to account for this.

> The bottom line, however, is that this behavior is too unpredictable
> to be worth worrying about in a production program. A cache miss in
> a tight loop could be created or eliminated by unrelated changes in
> distant parts of the code ... and in any case the behavior will be
> different on different CPUs. The 486, Pentium, and Pentium Pro all
> have radically different cache layouts, let alone non-Intel CPUs.

Agreed, the reasonable thing to do is to try to be sensitive to cache
effects and accept that there are mysteries.

thanks

-dg

David Gould dg(at)illustra(dot)com 510.628.3783 or 510.305.9468
Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612
"Don't worry about people stealing your ideas. If your ideas are any
good, you'll have to ram them down people's throats." -- Howard Aiken

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Hartwig 1998-06-15 23:15:37 Re: [HACKERS] User authentification failed
Previous Message Dmitry Samersoff 1998-06-15 21:59:53 User authentification failed