Skip site navigation (1) Skip section navigation (2)

Re: futex results with dbt-3

From: Manfred Spraul <manfred(at)colorfullife(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: neilc(at)samurai(dot)com, markw(at)osdl(dot)org,pgsql-performance(at)postgresql(dot)org
Subject: Re: futex results with dbt-3
Date: 2004-10-20 17:39:13
Message-ID: 4176A2C1.3070205@colorfullife.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane wrote:

>Manfred Spraul <manfred(at)colorfullife(dot)com> writes:
>  
>
>>Tom Lane wrote:
>>    
>>
>>>The bigger problem here is that the SMP locking bottlenecks we are
>>>currently seeing are *hardware* issues (AFAICT anyway).  The only way
>>>that futexes can offer a performance win is if they have a smarter way
>>>of executing the basic atomic-test-and-set sequence than we do;
>>>
>>>      
>>>
>>lwlocks operations are not a basic atomic-test-and-set sequence. They 
>>are spinlock, several nonatomic operations, spin_unlock.
>>    
>>
>
>Right, and it is the spinlock that is the problem.  See discussions a
>few months back: at least on Intel SMP machines, most of the problem
>seems to have to do with trading the spinlock's cache line back and
>forth between CPUs.
>
I'd disagree: cache line bouncing is one problem. If this happens then 
there is only one solution: The number of changes to that cacheline must 
be reduced. The tools that are used in the linux kernel are:
- hashing. An emergency approach if there is no other solution. I think 
RedHat used it for the buffer cache RH AS: Instead of one buffer cache, 
there were lots of smaller buffer caches with individual locks. The 
cache was chosen based on the file position (probably mixed with some 
pointers to avoid overloading cache 0).
- For read-heavy loads: sequence locks. A reader reads a counter value 
and then accesses the data structure. At the end it checks if the 
counter was modified. If it's still the same value then it can continue, 
otherwise it must retry. Writers acquire a normal spinlock and then 
modify the counter value. RCU is the second option, but there are 
patents - please be careful before using that tool.
- complete rewrites that avoid the global lock. I think the global 
buffer cache is now gone, everything is handled per-file. I think there 
is a global list for buffer replacement, but the at the top of the 
buffer replacement strategy is a simple clock algorithm. That means that 
simple lookups/accesses just set a (local) referenced bit and don't have 
to acquire a global lock. I know that this is the total opposite of ARC, 
but perhaps it's the only scalable solution. ARC could be used as the 
second level strategy.

But: According to the descriptions the problem is a context switch 
storm. I don't see that cache line bouncing can cause a context switch 
storm. What causes the context switch storm? If it's the pg_usleep in 
s_lock, then my patch should help a lot: with pthread_rwlock locks, this 
line doesn't exist anymore.

--
    Manfred

In response to

Responses

pgsql-performance by date

Next:From: Steve AtkinsDate: 2004-10-20 17:50:39
Subject: Re: how much mem to give postgres?
Previous:From: Rod TaylorDate: 2004-10-20 17:20:19
Subject: Re: Insert performance, what should I expect?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group