Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Wed Oct 21 13:43:59 UTC 2015

On 21/10/15 15:35, Sebastian Huber wrote:
>
>
> On 21/10/15 15:08, Isaac Gutekunst wrote:
>>
>>
>> On 10/21/2015 09:00 AM, Sebastian Huber wrote:
>>>
>>>
>>> On 21/10/15 14:56, Isaac Gutekunst wrote:
>>>> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>>>>
>>>>>
>>>>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>>>>> Thanks for the reply.
>>>>>>
>>>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>>>>> [...]
>>>>>>>
>>>>>>>>
>>>>>>>> As far as I can tell this would only occur if the caller of 
>>>>>>>> pthread_mutex_lock was in a
>>>>>>>> "bad"
>>>>>>>> state. I don't believe it is in an interrupt context, and don't 
>>>>>>>> know what other bad states
>>>>>>>> could exist.
>>>>>>>
>>>>>>> We have
>>>>>>>
>>>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>>>>    (!_Thread_Dispatch_is_enabled() \
>>>>>>>      && (_wait) \
>>>>>>>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>>>>
>>>>>>> What is the thread dispatch disable level and the system state 
>>>>>>> at this point?
>>>>>>>
>>>>>>> In case the thread dispatch disable level is not zero, then 
>>>>>>> something is probably broken
>>>>>>> in the
>>>>>>> operating system code which is difficult to find. Could be a 
>>>>>>> general memory corruption
>>>>>>> problem
>>>>>>> too. Which RTEMS version do you use?
>>>>>>>
>>>>>>
>>>>>> The thread dispatch disable level is usually -1 or -2.
>>>>>> (0xFFFFFFFE or 0xFFFFFFD).
>>>>>
>>>>> A negative value is very bad, but easy to detect via manual 
>>>>> instrumentation (only an hand full
>>>>> of spots touch this variable) or hardware breakpoints/watchpoints. 
>>>>> Looks the rest of
>>>>> _Per_CPU_Information all right?
>>>>>
>>>> It looks like it's only the thread_dispatch_disable_level that's 
>>>> broken.
>>>>
>>>> We'll go and grep for all places for all the places it's touched, 
>>>> and look for something.
>>>>
>>>> The problem with watchpoints is they fire exceptionally often, and 
>>>> putting in a conditional
>>>> watchpoint slows the code to a crawl, but that may be worth it.
>>>>
>>>> Here are some printouts of the relevant structs right after a crash:
>>>>
>>>> $4 = {
>>>>   cpu_per_cpu = {<No data fields>},
>>>>   isr_nest_level = 0,
>>>>   thread_dispatch_disable_level = 4294967295,
>>>>   executing = 0xc01585c8,
>>>>   heir = 0xc0154038,
>>>>   dispatch_necessary = true,
>>>>   time_of_last_context_switch = {
>>>>     sec = 2992,
>>>>     frac = 10737447432380511034
>>>>   },
>>>>   Stats = {<No data fields>}
>>>> }
>>>
>>> No, this doesn't look good. According to the stack trace you are in 
>>> thread context. However, we
>>> have executing != heir and dispatch_necessary == true. This is a 
>>> broken state itself. I guess,
>>> something is wrong with the interrupt level so that a context switch 
>>> is blocked. On ARMv7-M
>>> this is done via the system call exception.
>>>
>> This is a bit beyond my RTEMS knowledge. What would you advise 
>> looking into? 
>
> I would try to instrument the code to figure out where the thread 
> dispatch disable level goes negative. 

I would also read the file doc/cpu_supplement/arm.t and look for ARMv7-M.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.