POSIX Mutex Performance

Thu Mar 25 15:18:02 UTC 2004

Hi,

I have been thinking about this one.  The biggest
thing is the first one.  Other ideas and comments
follow.

+ We mentioned this problem earlier but now I think I
   can add some meat.  The RTEMS and Linux mutexes are
   different and have different feature sets.  RTEMS supports
   the priority inheritance and ceiling protocols.  I do not
   see any hint of that in the Linux pthread mutex code.

   The default case for RTEMS and Linux  mutexes are different.
   Linux picks a "fast" case which performs no error checking
   and will allow you to deadlock.  RTEMS always error checks
   and has attibutes on the base mutex whether nesting is legal
   or an error.  Linux will let you DEADLOCK in the default
   case!!

   RTEMS is a real-time operating system and wants to
   make the application's execution predictable.  This
   means providing tools to detect deadlock, avoid priority
   inversion, have application limits on resource usage, etc.
   Linux wasn't designed to meet those goals and the feature
   set in this area shows that.

+ The Classic API Mutexes (rtems_semaphore*) are a bit more
   optimized.  For sure, the POSIX API goes through a wrapper
   function which could technically be avoided to save a few
   instructions.

+ One person on the list noticed that your Linux times varied
   fairly significantly between two reports and I did not
   seem the explanation.  You might want to doublecheck the
   timing mechanism using something similar to the procedure
   in the tmck check.

+ RTEMS and glibc pthreads have different design approaches
   which impacts the create/destroy times.  RTEMS uses an
   ID which is opaque and thus even if dereferenced by the
   user won't harm the associated OS memory.  The Linux pthread
   code returns direct pointers to the user.  The pointer
   approach is a bit faster but not as safe/robust.  The other
   thing to note with this approach is that RTEMS explicit
   initializes every field and reuses a user configured and
   finite set of objects.  These have to be reinitialized every
   use.  So create and destroy is going to be more expensive.

+ In looking at the current glibc source, I don't see how there
   can be that much difference in the number of instructions
   actually executed.  From what I can tell, they do not inline
   anything into the application and when you get to lock,
   they make actual subroutine calls. I could be misreading this.

A mutex is a very well understood OS object and assuming
that everyone implemented it well, then the differences are
going to be in default behavior selected, implementation
structure/overhead, safety checks, and the like.

-- 
Joel Sherrill, Ph.D.             Director of Research & Development
joel at OARcorp.com                 On-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available                (256) 722-9985