Random lwIP Crashes in _POSIX_Mutex_Lock_support()
Sebastian Huber
sebastian.huber at embedded-brains.de
Wed Oct 21 13:35:18 UTC 2015
On 21/10/15 15:08, Isaac Gutekunst wrote:
>
>
> On 10/21/2015 09:00 AM, Sebastian Huber wrote:
>>
>>
>> On 21/10/15 14:56, Isaac Gutekunst wrote:
>>> On 10/21/2015 08:24 AM, Sebastian Huber wrote:
>>>>
>>>>
>>>> On 21/10/15 14:13, Isaac Gutekunst wrote:
>>>>> Thanks for the reply.
>>>>>
>>>>> On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>>>>>>
>>>>>>
>>>>>> On 20/10/15 16:02, Isaac Gutekunst wrote:
>>>> [...]
>>>>>>
>>>>>>>
>>>>>>> As far as I can tell this would only occur if the caller of
>>>>>>> pthread_mutex_lock was in a
>>>>>>> "bad"
>>>>>>> state. I don't believe it is in an interrupt context, and don't
>>>>>>> know what other bad states
>>>>>>> could exist.
>>>>>>
>>>>>> We have
>>>>>>
>>>>>> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>>>>>> (!_Thread_Dispatch_is_enabled() \
>>>>>> && (_wait) \
>>>>>> && (_System_state_Get() >= SYSTEM_STATE_UP))
>>>>>>
>>>>>> What is the thread dispatch disable level and the system state at
>>>>>> this point?
>>>>>>
>>>>>> In case the thread dispatch disable level is not zero, then
>>>>>> something is probably broken
>>>>>> in the
>>>>>> operating system code which is difficult to find. Could be a
>>>>>> general memory corruption
>>>>>> problem
>>>>>> too. Which RTEMS version do you use?
>>>>>>
>>>>>
>>>>> The thread dispatch disable level is usually -1 or -2.
>>>>> (0xFFFFFFFE or 0xFFFFFFD).
>>>>
>>>> A negative value is very bad, but easy to detect via manual
>>>> instrumentation (only an hand full
>>>> of spots touch this variable) or hardware breakpoints/watchpoints.
>>>> Looks the rest of
>>>> _Per_CPU_Information all right?
>>>>
>>> It looks like it's only the thread_dispatch_disable_level that's
>>> broken.
>>>
>>> We'll go and grep for all places for all the places it's touched,
>>> and look for something.
>>>
>>> The problem with watchpoints is they fire exceptionally often, and
>>> putting in a conditional
>>> watchpoint slows the code to a crawl, but that may be worth it.
>>>
>>> Here are some printouts of the relevant structs right after a crash:
>>>
>>> $4 = {
>>> cpu_per_cpu = {<No data fields>},
>>> isr_nest_level = 0,
>>> thread_dispatch_disable_level = 4294967295,
>>> executing = 0xc01585c8,
>>> heir = 0xc0154038,
>>> dispatch_necessary = true,
>>> time_of_last_context_switch = {
>>> sec = 2992,
>>> frac = 10737447432380511034
>>> },
>>> Stats = {<No data fields>}
>>> }
>>
>> No, this doesn't look good. According to the stack trace you are in
>> thread context. However, we
>> have executing != heir and dispatch_necessary == true. This is a
>> broken state itself. I guess,
>> something is wrong with the interrupt level so that a context switch
>> is blocked. On ARMv7-M
>> this is done via the system call exception.
>>
> This is a bit beyond my RTEMS knowledge. What would you advise looking
> into?
I would try to instrument the code to figure out where the thread
dispatch disable level goes negative.
--
Sebastian Huber, embedded brains GmbH
Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail : sebastian.huber at embedded-brains.de
PGP : Public key available on request.
Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
More information about the devel
mailing list