Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Sebastian Huber sebastian.huber at embedded-brains.de
Thu Oct 22 05:40:04 UTC 2015



On 21/10/15 15:48, Jay Doyle wrote:
>
>
> On 10/21/2015 09:35 AM, Sebastian Huber wrote:
>>
>>
>> On 21/10/15 15:08, Isaac Gutekunst wrote:
>>>
>>>
>>> On 10/21/2015 09:00 AM, Sebastian Huber wrote:
>>>>
>>>>
>>>> On 21/10/15 14:56, Isaac Gutekunst wrote:
>>>>> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>>>>>
>>>>>>
>>>>>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>>>>>> Thanks for the reply.
>>>>>>>
>>>>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>>>>>> [...]
>>>>>>>>
>>>>>>>>>
>>>>>>>>> As far as I can tell this would only occur if the caller of
>>>>>>>>> pthread_mutex_lock was in a
>>>>>>>>> "bad"
>>>>>>>>> state. I don't believe it is in an interrupt context, and
>>>>>>>>> don't know what other bad states
>>>>>>>>> could exist.
>>>>>>>>
>>>>>>>> We have
>>>>>>>>
>>>>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>>>>>    (!_Thread_Dispatch_is_enabled() \
>>>>>>>>      && (_wait) \
>>>>>>>>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>>>>>
>>>>>>>> What is the thread dispatch disable level and the system state
>>>>>>>> at this point?
>>>>>>>>
>>>>>>>> In case the thread dispatch disable level is not zero, then
>>>>>>>> something is probably broken
>>>>>>>> in the
>>>>>>>> operating system code which is difficult to find. Could be a
>>>>>>>> general memory corruption
>>>>>>>> problem
>>>>>>>> too. Which RTEMS version do you use?
>>>>>>>>
>>>>>>>
>>>>>>> The thread dispatch disable level is usually -1 or -2.
>>>>>>> (0xFFFFFFFE or 0xFFFFFFD).
>>>>>>
>>>>>> A negative value is very bad, but easy to detect via manual
>>>>>> instrumentation (only an hand full
>>>>>> of spots touch this variable) or hardware
>>>>>> breakpoints/watchpoints. Looks the rest of
>>>>>> _Per_CPU_Information all right?
>>>>>>
>>>>> It looks like it's only the thread_dispatch_disable_level that's
>>>>> broken.
>>>>>
>>>>> We'll go and grep for all places for all the places it's touched,
>>>>> and look for something.
>>>>>
>>>>> The problem with watchpoints is they fire exceptionally often, and
>>>>> putting in a conditional
>>>>> watchpoint slows the code to a crawl, but that may be worth it.
>>>>>
>>>>> Here are some printouts of the relevant structs right after a crash:
>>>>>
>>>>> $4 = {
>>>>>   cpu_per_cpu = {<No data fields>},
>>>>>   isr_nest_level = 0,
>>>>>   thread_dispatch_disable_level = 4294967295,
>>>>>   executing = 0xc01585c8,
>>>>>   heir = 0xc0154038,
>>>>>   dispatch_necessary = true,
>>>>>   time_of_last_context_switch = {
>>>>>     sec = 2992,
>>>>>     frac = 10737447432380511034
>>>>>   },
>>>>>   Stats = {<No data fields>}
>>>>> }
>>>>
>>>> No, this doesn't look good. According to the stack trace you are in
>>>> thread context. However, we
>>>> have executing != heir and dispatch_necessary == true. This is a
>>>> broken state itself. I guess,
>>>> something is wrong with the interrupt level so that a context
>>>> switch is blocked. On ARMv7-M
>>>> this is done via the system call exception.
>>>>
>>> This is a bit beyond my RTEMS knowledge. What would you advise
>>> looking into?
>>
>> I would try to instrument the code to figure out where the thread
>> dispatch disable level goes negative.
>>
>
> We just did.  I added a check in _ARMV7M_Interrupt_service_leave to
> see if the _Thread_Dispatch_disable_level is positive before the
> decrementing it and this eventually fails.
>
> I'm not sure if this tells us much because I think the call itself
> correct.  In this particular case it is processing an I2C interrupt.
> I will try to see if we can capture information about the sequence of
> changes to the _Thread_Dispatch_disable_level just before the point in
> which we know something is clearly wrong (i.e., decreasing it below
> zero.)

Since the isr_nest_level is 0, I don't think its a problem with the 
spots that use _ARMV7M_Interrupt_service_leave(). Did you check the 
interrupt priorities? See also

https://lists.rtems.org/pipermail/users/2015-June/029155.html

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.




More information about the devel mailing list