POSIX Spin Locks

Thu Jul 20 17:04:41 UTC 2006

The test & set business offers throughput & scalability improvements
because it can operate without context switches- IIF the architecture
provides CAS or DCAS type instructions and the spinlock implementation
supports it and there exists the possibility of another cpu entering the
spinlock.  For a single processor implementation it most likely doesn't
matter since any spinning has already incurred the context switch costs.
Or another way, assume task 1 got the spinlock.  Task 2 is higher
priority and was scheduled (via interrupt or something), it runs and
attempts to enter the critical section- so there have already been
multiple scheduler events by the time task 2 accesses the spinlock.

My interpretation is that for single processor boards a simple mutex
based implementation will do fine.  It works for multiprocessor systems
too but does not scale well because the OS must intervene to manage the
contention instead of pushing that job down to the hardware via CAS/DCAS
to avoid the overhead.

I think the single processor case is degenerate and probably works best
when "spinning" tasks are suspended and are awoken individually by the
preceeding task clearing the spinlock.  A CAS/DCAS implementation will
keep the cpu running the scheduled task until the other processor clears
the spinlock which allows the spinning cpu to resume- all without OS
involvement which is why it scales well.  By implication, spinlocks are
viewed as multiprocessor-safe and efficient critical sections, not as
semaphores that regulate scheduling across some possibly extended
interval.

As one example, Sun developed the CAS/DCAS approach to handle critical
sections in their slab memory allocation subsystem so Solaris would
scale better on multiprocessor boxes- but thats at 4 cpu's and above
with the dramatic effects appearing at 16 and higher.

Greg

Joel Sherrill writes:
 > Hi,
 > 
 > I am looking at the POSIX spinlock specification and can't
 > see why it can't be implemented as some mutex variation.
 > 
 > Yes I understand the "test and set and poll if not available"
 > technique but the OpenGroup page doesn't require that behavior.
 > 
 > http://www.opengroup.org/onlinepubs/009695399/functions/pthread_spin_lock.html
 > 
 > ====================================
 > The /pthread_spin_lock/() function shall lock the spin lock referenced 
 > by /lock/. The calling thread shall acquire the lock if it is not held 
 > by another thread. Otherwise, the thread shall spin (that is, shall not 
 > return from the /pthread_spin_lock/() call) until the lock becomes 
 > available. The results are undefined if the calling thread holds the 
 > lock at the time the call is made. The /pthread_spin_trylock/() function 
 > shall lock the spin lock referenced by /lock/ if it is not held by any 
 > thread. Otherwise, the function shall fail.
 > ====================================
 > 
 >  From a user's perspective, a mutex lock does not return until the lock
 > becomes available. 
 > 
 > RTEMS SuperCore mutexes are very efficient.  I don't see the point of
 > not doing a simple implementation based upon SuperCore mutexes.
 > 
 > Any thoughts, comments, insights?
 > 
 > --joel
 >