Random lwIP Crashes in _POSIX_Mutex_Lock_support()

Isaac Gutekunst isaac.gutekunst at vecna.com
Wed Oct 21 12:13:07 UTC 2015


Thanks for the reply.

On 10/21/2015 01:50 AM, Sebastian Huber wrote:
>
>
> On 20/10/15 16:02, Isaac Gutekunst wrote:
>> Hi Devel,
>>
>> I'm pretty sure this is a devel question, not users.
>>
>>
>> I'm working with a colleague at Vecna to port lwIP to the STM32F7 BSP we've developed.
>>
>> We have a basic HTTP server that prints out the current list of tasks. We refresh the page at
>> a very high rate, and after about 1-30 minutes, get a crash.
>>

>> Every time the exception is thrown after _CORE_mutex_Check_dispatch_for_seize( wait )  on
>> line 254 of coremuteximpl.h. Every time this is inside a pthread_mutex_lock() call.
>>
>>
>> Here is the full backtrace:
>>
>> stm32fxxxx_fatal_error_handler() at hal-fatal-error-handler.c:126 0x800af92
>> _User_extensions_Fatal_visitor() at userextiterate.c:123 0x803212c
>> _User_extensions_Iterate() at userextiterate.c:166 0x80321c0
>> _User_extensions_Fatal() at userextimpl.h:254 0x802a85e
>> _Terminate() at interr.c:44 0x802a888
>> _CORE_mutex_Seize_body() at coremuteximpl.h:255 0x8068df0
>> _POSIX_Mutex_Lock_support() at mutexlocksupp.c:57 0x806907e
>> pthread_mutex_lock() at mutexlock.c:40 0x8068bee
>> sys_arch_sem_wait() at sys_arch.c:485 0x808da8a
>> sys_arch_mbox_fetch() at sys_arch.c:357 0x808d804
>> sys_timeouts_mbox_fetch() at timers.c:532 0x80883ce
>> tcpip_thread() at tcpip.c:95 0x808c170
>> _Thread_Handler() at threadhandler.c:102 0x806bbe8
>> _User_extensions_Thread_exitted() at userextimpl.h:244 0x806bb60
>> bsp_section_work_begin() at 0xc016a12c
>>
>>
>> However, the lwip code calling pthread_mutex_lock varies, but is consistently from lwIP.
>>
>>
>> Does this ring any bells?
>
> Normally you get this if you obtain a locked mutex in interrupt context, but your stack trace
> says you are not.

That was my first suspicion as well.
>
>>
>> As far as I can tell this would only occur if the caller of pthread_mutex_lock was in a "bad"
>> state. I don't believe it is in an interrupt context, and don't know what other bad states
>> could exist.
>
> We have
>
> #define _CORE_mutex_Check_dispatch_for_seize(_wait) \
>    (!_Thread_Dispatch_is_enabled() \
>      && (_wait) \
>      && (_System_state_Get() >= SYSTEM_STATE_UP))
>
> What is the thread dispatch disable level and the system state at this point?
>
> In case the thread dispatch disable level is not zero, then something is probably broken in the
> operating system code which is difficult to find. Could be a general memory corruption problem
> too. Which RTEMS version do you use?
>

The thread dispatch disable level is usually -1 or -2.
(0xFFFFFFFE or 0xFFFFFFD).

We first suspected that _Thread_Dispatch_decrement_disable_level (in threaddispatch.h) was 
being called two many times (somehow). However, it always crashes without the check being fired.

For the record, I inserted this snippet of code:

     if (disable_level < 0) {
         _Terminate(
             INTERNAL_ERROR_CORE,
             true,
             INTERNAL_ERROR_MUTEX_OBTAIN_FROM_BAD_STATE
         );
       // In case the _Terminate call doesn't work
       __asm__ volatile ("BKPT #01");
     }

This pointed us towards a general memory corruption issue, so we are a bit stuck. Another 
avenue we are exploring is sticking tests for a negative disable_level all over the code hoping 
to get closer to the corruption.

We are running a fork based on 314ff3c43ff1c00232e201df68e39cc0e5600d95. Our changes since then 
include the addition of our STM32F BSPs, but no changes to the kernel except a new CAN driver.


Real trace functionality would be really nice, but we lack the hardware (both trace probes, and 
exposed trace lines).

This is probably a stretch, but does anyone have experience getting the ETM or ITM sending data 
to the ETB and getting the data over JTAG? (with RTEMS and GCC)

Isaac


More information about the devel mailing list