Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Wed Oct 21 13:08:39 UTC 2015


On 10/21/2015 09:00 AM, Sebastian Huber wrote:
>
>
> On 21/10/15 14:56, Isaac Gutekunst wrote:
>> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>>
>>>
>>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>>> Thanks for the reply.
>>>>
>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>>
>>>>>
>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>>> [...]
>>>>>
>>>>>>
>>>>>> As far as I can tell this would only occur if the caller of pthread_mutex_lock was in a
>>>>>> "bad"
>>>>>> state. I don't believe it is in an interrupt context, and don't know what other bad states
>>>>>> could exist.
>>>>>
>>>>> We have
>>>>>
>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>>    (!_Thread_Dispatch_is_enabled() \
>>>>>      && (_wait) \
>>>>>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>>
>>>>> What is the thread dispatch disable level and the system state at this point?
>>>>>
>>>>> In case the thread dispatch disable level is not zero, then something is probably broken
>>>>> in the
>>>>> operating system code which is difficult to find. Could be a general memory corruption
>>>>> problem
>>>>> too. Which RTEMS version do you use?
>>>>>
>>>>
>>>> The thread dispatch disable level is usually -1 or -2.
>>>> (0xFFFFFFFE or 0xFFFFFFD).
>>>
>>> A negative value is very bad, but easy to detect via manual instrumentation (only an hand full
>>> of spots touch this variable) or hardware breakpoints/watchpoints. Looks the rest of
>>> _Per_CPU_Information all right?
>>>
>> It looks like it's only the thread_dispatch_disable_level that's broken.
>>
>> We'll go and grep for all places for all the places it's touched, and look for something.
>>
>> The problem with watchpoints is they fire exceptionally often, and putting in a conditional
>> watchpoint slows the code to a crawl, but that may be worth it.
>>
>> Here are some printouts of the relevant structs right after a crash:
>>
>> $4 = {
>>   cpu_per_cpu = {<No data fields>},
>>   isr_nest_level = 0,
>>   thread_dispatch_disable_level = 4294967295,
>>   executing = 0xc01585c8,
>>   heir = 0xc0154038,
>>   dispatch_necessary = true,
>>   time_of_last_context_switch = {
>>     sec = 2992,
>>     frac = 10737447432380511034
>>   },
>>   Stats = {<No data fields>}
>> }
>
> No, this doesn't look good. According to the stack trace you are in thread context. However, we
> have executing != heir and dispatch_necessary == true. This is a broken state itself. I guess,
> something is wrong with the interrupt level so that a context switch is blocked. On ARMv7-M
> this is done via the system call exception.
>
This is a bit beyond my RTEMS knowledge. What would you advise looking into?