[RTEMS Project] #2811: More robust thread dispatching on SMP and ARM Cortex-M

Wed Nov 16 07:43:30 UTC 2016

#2811: More robust thread dispatching on SMP and ARM Cortex-M
-----------------------------+------------------------------
 Reporter:  sebastian.huber  |       Owner:  sebastian.huber
     Type:  enhancement      |      Status:  new
 Priority:  normal           |   Milestone:  4.12
Component:  cpukit           |     Version:  4.11
 Severity:  normal           |  Resolution:
 Keywords:                   |
-----------------------------+------------------------------

Comment (by chrisj):

 Replying to [comment:4 sebastian.huber]:
 > I think a fatal error is  more appropriate here.
 >
 > * Applications which have this usage error needs to be fixed at compile-
 time. It makes no sense to ship an SMP application with this bug.

 A fatal error is still run-time and not a compile time error so you have
 lost me here.

 >
 > * Return codes can be ignored. I definitely have seen code like this
 before:
 > {{{
 > #!c
 > /* This cannot fail, we know the identifier is valid */
 > (void) pthread_mutex_lock(&mtx);
 > }}}
 >

 This is a different issue and a change of topic. We provide the means for
 errors to be analyzed and that is our boundary.

 > * This ticket is a result of porting a real world application from uni-
 processor to SMP.  If you are not an expert of the operating system
 internals and your application has this bug, then you need easily a couple
 of days to figure out the problem.  So, it is important to make sure it
 gets detected.

 I agree with detecting the issue and there being an error. It is the
 delivery we are discussing.

 The error code should provide some help just like the fatal error code. If
 one can the other can.

 How many fatal errors instance are there in RTEMS in the kernel? Not the
 number of error code, but the specific locations a fatal error can appear,
 ie code/line pairs? I have never audited this.

 >
 > * To figure out what caused a fatal error is easy. The (source, error)
 pair uniquely identifies the source code location of the error.

 The source location is a line the kernel's core code which means users
 need to step into this code and figure out the answer. I have been hit by
 this with SMP and it is hard.

 > With a stack trace and the executing thread you get enough information
 to locate the problem in the code. There is no need for a thread aware
 debugger.

 This implies testing will highlight the issue because you have a debugger
 to give you this data. Currently RTEMS standard or default stack traces
 that get called on a fatal error provide little if any information that
 could be used to resolve the exact source, eg the thread id executing or
 even better an unwinder (dreaming here). Better support for tier 1 archs
 would help.

 >
 > * This is a new constraint specific to SMP. Existing software may be
 simply unaware of this issue. However, its important to detect this
 constraint violation.

 I agree it is important.

 > * _Thread_Do_dispatch() has no return value.  Adding this check to other
 places would be much more difficult, error prone. with more space and time
 overhead, and labour intensive to test.

 There are no other similar tests happening now on the blocking paths?

--
Ticket URL: <http://devel.rtems.org/ticket/2811#comment:5>
RTEMS Project <http://www.rtems.org/>
RTEMS Project