Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Wed Oct 21 13:35:18 UTC 2015

On 21/10/15 15:08, Isaac Gutekunst wrote:
>
>
> On 10/21/2015 09:00 AM, Sebastian Huber wrote:
>>
>>
>> On 21/10/15 14:56, Isaac Gutekunst wrote:
>>> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>>>
>>>>
>>>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>>>> Thanks for the reply.
>>>>>
>>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>>>
>>>>>>
>>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>>>> [...]
>>>>>>
>>>>>>>
>>>>>>> As far as I can tell this would only occur if the caller of 
>>>>>>> pthread_mutex_lock was in a
>>>>>>> "bad"
>>>>>>> state. I don't believe it is in an interrupt context, and don't 
>>>>>>> know what other bad states
>>>>>>> could exist.
>>>>>>
>>>>>> We have
>>>>>>
>>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>>>    (!_Thread_Dispatch_is_enabled() \
>>>>>>      && (_wait) \
>>>>>>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>>>
>>>>>> What is the thread dispatch disable level and the system state at 
>>>>>> this point?
>>>>>>
>>>>>> In case the thread dispatch disable level is not zero, then 
>>>>>> something is probably broken
>>>>>> in the
>>>>>> operating system code which is difficult to find. Could be a 
>>>>>> general memory corruption
>>>>>> problem
>>>>>> too. Which RTEMS version do you use?
>>>>>>
>>>>>
>>>>> The thread dispatch disable level is usually -1 or -2.
>>>>> (0xFFFFFFFE or 0xFFFFFFD).
>>>>
>>>> A negative value is very bad, but easy to detect via manual 
>>>> instrumentation (only an hand full
>>>> of spots touch this variable) or hardware breakpoints/watchpoints. 
>>>> Looks the rest of
>>>> _Per_CPU_Information all right?
>>>>
>>> It looks like it's only the thread_dispatch_disable_level that's 
>>> broken.
>>>
>>> We'll go and grep for all places for all the places it's touched, 
>>> and look for something.
>>>
>>> The problem with watchpoints is they fire exceptionally often, and 
>>> putting in a conditional
>>> watchpoint slows the code to a crawl, but that may be worth it.
>>>
>>> Here are some printouts of the relevant structs right after a crash:
>>>
>>> $4 = {
>>>   cpu_per_cpu = {<No data fields>},
>>>   isr_nest_level = 0,
>>>   thread_dispatch_disable_level = 4294967295,
>>>   executing = 0xc01585c8,
>>>   heir = 0xc0154038,
>>>   dispatch_necessary = true,
>>>   time_of_last_context_switch = {
>>>     sec = 2992,
>>>     frac = 10737447432380511034
>>>   },
>>>   Stats = {<No data fields>}
>>> }
>>
>> No, this doesn't look good. According to the stack trace you are in 
>> thread context. However, we
>> have executing != heir and dispatch_necessary == true. This is a 
>> broken state itself. I guess,
>> something is wrong with the interrupt level so that a context switch 
>> is blocked. On ARMv7-M
>> this is done via the system call exception.
>>
> This is a bit beyond my RTEMS knowledge. What would you advise looking 
> into? 

I would try to instrument the code to figure out where the thread 
dispatch disable level goes negative.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.