[RTEMS Project] #2811: More robust thread dispatching on SMP and ARM Cortex-M

RTEMS trac trac at rtems.org
Wed Nov 16 08:07:30 UTC 2016


#2811: More robust thread dispatching on SMP and ARM Cortex-M
-----------------------------+------------------------------
 Reporter:  sebastian.huber  |       Owner:  sebastian.huber
     Type:  enhancement      |      Status:  new
 Priority:  normal           |   Milestone:  4.12
Component:  cpukit           |     Version:  4.11
 Severity:  normal           |  Resolution:
 Keywords:                   |
-----------------------------+------------------------------

Comment (by sebastian.huber):

 Replying to [comment:5 chrisj]:
 > Replying to [comment:4 sebastian.huber]:
 > > I think a fatal error is  more appropriate here.
 > >
 > > * Applications which have this usage error needs to be fixed at
 compile-time. It makes no sense to ship an SMP application with this bug.
 >
 > A fatal error is still run-time and not a compile time error so you have
 lost me here.

 It is an error that must be fixed during development.  Otherwise you have
 a broken product.

 >
 > >
 > > * Return codes can be ignored. I definitely have seen code like this
 before:
 > > {{{
 > > #!c
 > > /* This cannot fail, we know the identifier is valid */
 > > (void) pthread_mutex_lock(&mtx);
 > > }}}
 > >
 >
 > This is a different issue and a change of topic. We provide the means
 for errors to be analyzed and that is our boundary.
 >
 > > * This ticket is a result of porting a real world application from
 uni-processor to SMP.  If you are not an expert of the operating system
 internals and your application has this bug, then you need easily a couple
 of days to figure out the problem.  So, it is important to make sure it
 gets detected.
 >
 > I agree with detecting the issue and there being an error. It is the
 delivery we are discussing.
 >
 > The error code should provide some help just like the fatal error code.
 If one can the other can.
 >
 > How many fatal errors instance are there in RTEMS in the kernel? Not the
 number of error code, but the specific locations a fatal error can appear,
 ie code/line pairs? I have never audited this.

 See Internal_errors_Core_list, we have a test for every fatal internal
 error.

 >
 > >
 > > * To figure out what caused a fatal error is easy. The (source, error)
 pair uniquely identifies the source code location of the error.
 >
 > The source location is a line the kernel's core code which means users
 need to step into this code and figure out the answer. I have been hit by
 this with SMP and it is hard.
 >
 > > With a stack trace and the executing thread you get enough information
 to locate the problem in the code. There is no need for a thread aware
 debugger.
 >
 > This implies testing will highlight the issue because you have a
 debugger to give you this data. Currently RTEMS standard or default stack
 traces that get called on a fatal error provide little if any information
 that could be used to resolve the exact source, eg the thread id executing
 or even better an unwinder (dreaming here). Better support for tier 1
 archs would help.

 Improved fatal error diagnostics is a different topic.  With a debugger is
 a matter of seconds to figure out the problem spot of a fatal error.

 >
 > >
 > > * This is a new constraint specific to SMP. Existing software may be
 simply unaware of this issue. However, its important to detect this
 constraint violation.
 >
 > I agree it is important.
 >
 > > * _Thread_Do_dispatch() has no return value.  Adding this check to
 other places would be much more difficult, error prone. with more space
 and time overhead, and labour intensive to test.
 >
 > There are no other similar tests happening now on the blocking paths?

 No, this is a weak area in RTEMS.  For example call
 rtems_task_wake_after() in an interrupt service routine.  You don't get
 any status information that this is stupid.

 For now, I think a fatal error is sufficient. In case there is really a
 problem with this in the field, we can still improve things. What matters
 is that this constraint violation gets detected, otherwise you can spend
 hours on debugging.

--
Ticket URL: <http://devel.rtems.org/ticket/2811#comment:6>
RTEMS Project <http://www.rtems.org/>
RTEMS Project


More information about the bugs mailing list