Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Sebastian Huber sebastian.huber at embedded-brains.de
Wed Oct 21 13:00:36 UTC 2015



On 21/10/15 14:56, Isaac Gutekunst wrote:
> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>
>>
>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>> Thanks for the reply.
>>>
>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>
>>>>
>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>> [...]
>>>>
>>>>>
>>>>> As far as I can tell this would only occur if the caller of 
>>>>> pthread_mutex_lock was in a "bad"
>>>>> state. I don't believe it is in an interrupt context, and don't 
>>>>> know what other bad states
>>>>> could exist.
>>>>
>>>> We have
>>>>
>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>    (!_Thread_Dispatch_is_enabled() \
>>>>      && (_wait) \
>>>>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>
>>>> What is the thread dispatch disable level and the system state at 
>>>> this point?
>>>>
>>>> In case the thread dispatch disable level is not zero, then 
>>>> something is probably broken in the
>>>> operating system code which is difficult to find. Could be a 
>>>> general memory corruption problem
>>>> too. Which RTEMS version do you use?
>>>>
>>>
>>> The thread dispatch disable level is usually -1 or -2.
>>> (0xFFFFFFFE or 0xFFFFFFD).
>>
>> A negative value is very bad, but easy to detect via manual 
>> instrumentation (only an hand full
>> of spots touch this variable) or hardware breakpoints/watchpoints. 
>> Looks the rest of
>> _Per_CPU_Information all right?
>>
> It looks like it's only the thread_dispatch_disable_level that's broken.
>
> We'll go and grep for all places for all the places it's touched, and 
> look for something.
>
> The problem with watchpoints is they fire exceptionally often, and 
> putting in a conditional watchpoint slows the code to a crawl, but 
> that may be worth it.
>
> Here are some printouts of the relevant structs right after a crash:
>
> $4 = {
>   cpu_per_cpu = {<No data fields>},
>   isr_nest_level = 0,
>   thread_dispatch_disable_level = 4294967295,
>   executing = 0xc01585c8,
>   heir = 0xc0154038,
>   dispatch_necessary = true,
>   time_of_last_context_switch = {
>     sec = 2992,
>     frac = 10737447432380511034
>   },
>   Stats = {<No data fields>}
> } 

No, this doesn't look good. According to the stack trace you are in 
thread context. However, we have executing != heir and 
dispatch_necessary == true. This is a broken state itself. I guess, 
something is wrong with the interrupt level so that a context switch is 
blocked. On ARMv7-M this is done via the system call exception.

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.




More information about the devel mailing list