Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Isaac Gutekunst isaac.gutekunst at vecna.com
Thu Oct 22 18:37:18 UTC 2015


I think I may have some information that's actually useful.

I've managed to actually execute some tests.... and lots of them are failing.

sp01 and sp02 fail quite quickly, as an assertion fails.

assertion "first != _Chain_Tail( &ready_queues[ index ] )" failed: file 
"../../cpukit/../../../stm32f7x/lib/         include/rtems/score/schedulerpriorityimpl.h", line 
166, function: _Scheduler_priority_Ready_queue_first

This failure is common to many of the failed tests so far. What does this mean?


Isaac

On 10/22/2015 09:16 AM, Jay Doyle wrote:
>
>
> On 10/22/2015 01:40 AM, Sebastian Huber wrote:
>>
>>
>> On 21/10/15 15:48, Jay Doyle wrote:
>>>
>>>
>>> On 10/21/2015 09:35 AM, Sebastian Huber wrote:
>>>>
>>>>
>>>> On 21/10/15 15:08, Isaac Gutekunst wrote:
>>>>>
>>>>>
>>>>> On 10/21/2015 09:00 AM, Sebastian Huber wrote:
>>>>>>
>>>>>>
>>>>>> On 21/10/15 14:56, Isaac Gutekunst wrote:
>>>>>>> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>>>>>>>> Thanks for the reply.
>>>>>>>>>
>>>>>>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>>>>>>>> [...]
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> As far as I can tell this would only occur if the caller of
>>>>>>>>>>> pthread_mutex_lock was in a
>>>>>>>>>>> "bad"
>>>>>>>>>>> state. I don't believe it is in an interrupt context, and
>>>>>>>>>>> don't know what other bad states
>>>>>>>>>>> could exist.
>>>>>>>>>>
>>>>>>>>>> We have
>>>>>>>>>>
>>>>>>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>>>>>>>    (!_Thread_Dispatch_is_enabled() \
>>>>>>>>>>      && (_wait) \
>>>>>>>>>>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>>>>>>>
>>>>>>>>>> What is the thread dispatch disable level and the system state
>>>>>>>>>> at this point?
>>>>>>>>>>
>>>>>>>>>> In case the thread dispatch disable level is not zero, then
>>>>>>>>>> something is probably broken
>>>>>>>>>> in the
>>>>>>>>>> operating system code which is difficult to find. Could be a
>>>>>>>>>> general memory corruption
>>>>>>>>>> problem
>>>>>>>>>> too. Which RTEMS version do you use?
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The thread dispatch disable level is usually -1 or -2.
>>>>>>>>> (0xFFFFFFFE or 0xFFFFFFD).
>>>>>>>>
>>>>>>>> A negative value is very bad, but easy to detect via manual
>>>>>>>> instrumentation (only an hand full
>>>>>>>> of spots touch this variable) or hardware
>>>>>>>> breakpoints/watchpoints. Looks the rest of
>>>>>>>> _Per_CPU_Information all right?
>>>>>>>>
>>>>>>> It looks like it's only the thread_dispatch_disable_level that's
>>>>>>> broken.
>>>>>>>
>>>>>>> We'll go and grep for all places for all the places it's touched,
>>>>>>> and look for something.
>>>>>>>
>>>>>>> The problem with watchpoints is they fire exceptionally often, and
>>>>>>> putting in a conditional
>>>>>>> watchpoint slows the code to a crawl, but that may be worth it.
>>>>>>>
>>>>>>> Here are some printouts of the relevant structs right after a crash:
>>>>>>>
>>>>>>> $4 = {
>>>>>>>   cpu_per_cpu = {<No data fields>},
>>>>>>>   isr_nest_level = 0,
>>>>>>>   thread_dispatch_disable_level = 4294967295,
>>>>>>>   executing = 0xc01585c8,
>>>>>>>   heir = 0xc0154038,
>>>>>>>   dispatch_necessary = true,
>>>>>>>   time_of_last_context_switch = {
>>>>>>>     sec = 2992,
>>>>>>>     frac = 10737447432380511034
>>>>>>>   },
>>>>>>>   Stats = {<No data fields>}
>>>>>>> }
>>>>>>
>>>>>> No, this doesn't look good. According to the stack trace you are in
>>>>>> thread context. However, we
>>>>>> have executing != heir and dispatch_necessary == true. This is a
>>>>>> broken state itself. I guess,
>>>>>> something is wrong with the interrupt level so that a context
>>>>>> switch is blocked. On ARMv7-M
>>>>>> this is done via the system call exception.
>>>>>>
>>>>> This is a bit beyond my RTEMS knowledge. What would you advise
>>>>> looking into?
>>>>
>>>> I would try to instrument the code to figure out where the thread
>>>> dispatch disable level goes negative.
>>>>
>>>
>>> We just did.  I added a check in _ARMV7M_Interrupt_service_leave to
>>> see if the _Thread_Dispatch_disable_level is positive before the
>>> decrementing it and this eventually fails.
>>>
>>> I'm not sure if this tells us much because I think the call itself
>>> correct.  In this particular case it is processing an I2C interrupt.
>>> I will try to see if we can capture information about the sequence of
>>> changes to the _Thread_Dispatch_disable_level just before the point in
>>> which we know something is clearly wrong (i.e., decreasing it below
>>> zero.)
>>
>> Since the isr_nest_level is 0, I don't think its a problem with the spots that use
>> _ARMV7M_Interrupt_service_leave(). Did you check the interrupt priorities? See also
>>
>> https://lists.rtems.org/pipermail/users/2015-June/029155.html
>>
> Thanks for the pointer to this posting.  It seems like a very similar situation to what we are
> experiencing -- especially considering that we invoke an RTEMS call in our ethernet isr.
> Unfortunately, all our interrupts use the default interrupt priority level set in the bsp
> header file as:
>
> #define BSP_ARMV7M_IRQ_PRIORITY_DEFAULT (13 << 4)
>
> which should be mean that they are all non-NMIs unless we explicitly set their interrupt level
> lower.
>
>
>
>
> _______________________________________________
> devel mailing list
> devel at rtems.org
> http://lists.rtems.org/mailman/listinfo/devel


More information about the devel mailing list