Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Joel Sherrill joel.sherrill at oarcorp.com
Fri Oct 23 12:02:34 UTC 2015



On October 22, 2015 1:37:18 PM CDT, Isaac Gutekunst <isaac.gutekunst at vecna.com> wrote:
>I think I may have some information that's actually useful.
>
>I've managed to actually execute some tests.... and lots of them are
>failing.
>
>sp01 and sp02 fail quite quickly, as an assertion fails.
>
>assertion "first != _Chain_Tail( &ready_queues[ index ] )" failed: file
>
>"../../cpukit/../../../stm32f7x/lib/        
>include/rtems/score/schedulerpriorityimpl.h", line 
>166, function: _Scheduler_priority_Ready_queue_first
>
>This failure is common to many of the failed tests so far. What does
>this mean?
>

Does hello run?

>Isaac
>
>On 10/22/2015 09:16 AM, Jay Doyle wrote:
>>
>>
>> On 10/22/2015 01:40 AM, Sebastian Huber wrote:
>>>
>>>
>>> On 21/10/15 15:48, Jay Doyle wrote:
>>>>
>>>>
>>>> On 10/21/2015 09:35 AM, Sebastian Huber wrote:
>>>>>
>>>>>
>>>>> On 21/10/15 15:08, Isaac Gutekunst wrote:
>>>>>>
>>>>>>
>>>>>> On 10/21/2015 09:00 AM, Sebastian Huber wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 21/10/15 14:56, Isaac Gutekunst wrote:
>>>>>>>> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>>>>>>>>> Thanks for the reply.
>>>>>>>>>>
>>>>>>>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>>>>>>>>> [...]
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> As far as I can tell this would only occur if the caller of
>>>>>>>>>>>> pthread_mutex_lock was in a
>>>>>>>>>>>> "bad"
>>>>>>>>>>>> state. I don't believe it is in an interrupt context, and
>>>>>>>>>>>> don't know what other bad states
>>>>>>>>>>>> could exist.
>>>>>>>>>>>
>>>>>>>>>>> We have
>>>>>>>>>>>
>>>>>>>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>>>>>>>>    (!_Thread_Dispatch_is_enabled() \
>>>>>>>>>>>      && (_wait) \
>>>>>>>>>>>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>>>>>>>>
>>>>>>>>>>> What is the thread dispatch disable level and the system
>state
>>>>>>>>>>> at this point?
>>>>>>>>>>>
>>>>>>>>>>> In case the thread dispatch disable level is not zero, then
>>>>>>>>>>> something is probably broken
>>>>>>>>>>> in the
>>>>>>>>>>> operating system code which is difficult to find. Could be a
>>>>>>>>>>> general memory corruption
>>>>>>>>>>> problem
>>>>>>>>>>> too. Which RTEMS version do you use?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The thread dispatch disable level is usually -1 or -2.
>>>>>>>>>> (0xFFFFFFFE or 0xFFFFFFD).
>>>>>>>>>
>>>>>>>>> A negative value is very bad, but easy to detect via manual
>>>>>>>>> instrumentation (only an hand full
>>>>>>>>> of spots touch this variable) or hardware
>>>>>>>>> breakpoints/watchpoints. Looks the rest of
>>>>>>>>> _Per_CPU_Information all right?
>>>>>>>>>
>>>>>>>> It looks like it's only the thread_dispatch_disable_level
>that's
>>>>>>>> broken.
>>>>>>>>
>>>>>>>> We'll go and grep for all places for all the places it's
>touched,
>>>>>>>> and look for something.
>>>>>>>>
>>>>>>>> The problem with watchpoints is they fire exceptionally often,
>and
>>>>>>>> putting in a conditional
>>>>>>>> watchpoint slows the code to a crawl, but that may be worth it.
>>>>>>>>
>>>>>>>> Here are some printouts of the relevant structs right after a
>crash:
>>>>>>>>
>>>>>>>> $4 = {
>>>>>>>>   cpu_per_cpu = {<No data fields>},
>>>>>>>>   isr_nest_level = 0,
>>>>>>>>   thread_dispatch_disable_level = 4294967295,
>>>>>>>>   executing = 0xc01585c8,
>>>>>>>>   heir = 0xc0154038,
>>>>>>>>   dispatch_necessary = true,
>>>>>>>>   time_of_last_context_switch = {
>>>>>>>>     sec = 2992,
>>>>>>>>     frac = 10737447432380511034
>>>>>>>>   },
>>>>>>>>   Stats = {<No data fields>}
>>>>>>>> }
>>>>>>>
>>>>>>> No, this doesn't look good. According to the stack trace you are
>in
>>>>>>> thread context. However, we
>>>>>>> have executing != heir and dispatch_necessary == true. This is a
>>>>>>> broken state itself. I guess,
>>>>>>> something is wrong with the interrupt level so that a context
>>>>>>> switch is blocked. On ARMv7-M
>>>>>>> this is done via the system call exception.
>>>>>>>
>>>>>> This is a bit beyond my RTEMS knowledge. What would you advise
>>>>>> looking into?
>>>>>
>>>>> I would try to instrument the code to figure out where the thread
>>>>> dispatch disable level goes negative.
>>>>>
>>>>
>>>> We just did.  I added a check in _ARMV7M_Interrupt_service_leave to
>>>> see if the _Thread_Dispatch_disable_level is positive before the
>>>> decrementing it and this eventually fails.
>>>>
>>>> I'm not sure if this tells us much because I think the call itself
>>>> correct.  In this particular case it is processing an I2C
>interrupt.
>>>> I will try to see if we can capture information about the sequence
>of
>>>> changes to the _Thread_Dispatch_disable_level just before the point
>in
>>>> which we know something is clearly wrong (i.e., decreasing it below
>>>> zero.)
>>>
>>> Since the isr_nest_level is 0, I don't think its a problem with the
>spots that use
>>> _ARMV7M_Interrupt_service_leave(). Did you check the interrupt
>priorities? See also
>>>
>>> https://lists.rtems.org/pipermail/users/2015-June/029155.html
>>>
>> Thanks for the pointer to this posting.  It seems like a very similar
>situation to what we are
>> experiencing -- especially considering that we invoke an RTEMS call
>in our ethernet isr.
>> Unfortunately, all our interrupts use the default interrupt priority
>level set in the bsp
>> header file as:
>>
>> #define BSP_ARMV7M_IRQ_PRIORITY_DEFAULT (13 << 4)
>>
>> which should be mean that they are all non-NMIs unless we explicitly
>set their interrupt level
>> lower.
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel at rtems.org
>> http://lists.rtems.org/mailman/listinfo/devel
>_______________________________________________
>devel mailing list
>devel at rtems.org
>http://lists.rtems.org/mailman/listinfo/devel

--joel



More information about the devel mailing list