[PATCH 2/3] bsps/i386: use Pentimum instructions for pc586 and pc686 builds.
Pavel Pisa
pisa at cmp.felk.cvut.cz
Wed Oct 12 13:36:00 UTC 2016
Hello Sebastian,
On Wednesday 12 of October 2016 10:35:55 Sebastian Huber wrote:
> On 12/10/16 10:26, pisa at cmp.felk.cvut.cz wrote:
> > SMP build is broken with i386 set because libatomic and GCC
> > generate infinite loop for __atomic_fetch_add_4 used
> > in rtems_interrupt_lock_acquire
> >
> > __atomic_fetch_add_4:
> > push %ebp
> > mov %esp,%ebp
> > movl $0x5,0x10(%ebp)
> > pop %ebp
> > jmp __atomic_fetch_add_4
>
> Do you have a test case for this compiler/RTEMS bug? The use of
> libatomic is inefficient, but it should work.
may be it is problem of my i386 toolchain build, I have not
updated it from April.
The next is a simple test
------------------------------------------------
#include <stdatomic.h>
atomic_uint atvar1;
volatile unsigned int res1;
volatile unsigned int res2;
int main(void)
{
res1 = atomic_fetch_or(&atvar1, 0x55);
res2 = atomic_fetch_add(&atvar1, 0xaa);
return 0;
}
------------------------------------------------
The next build commands are used
i386-rtems4.12-gcc --pipe -B/opt/rtems4.12/i386-rtems4.12/pc686/lib/ -specs
bsp_specs -qrtems -I /opt/rtems4.12/i386-rtems4.12/pc686/lib/include -march=i386 -Wall -O2 -g -ffunction-sections -fdata-sections -o
libatomic-add-test.o -c libatomic-add-test.c
i386-rtems4.12-gcc --pipe -B/opt/rtems4.12/i386-rtems4.12/pc686/lib/ -specs
bsp_specs -qrtems -mtune=pentiumpro -march=pentium -Wall -O2 -g -ffunction-sections -fdata-sections -Wl,--gc-sections -Wl,-Ttext,0x00100000
libatomic-add-test.o -o libatomic-test-add
problem appears with and without -march=i386, when -march is something
newer (pentium) then all is OK.
Disassembly looks like
------------------------------------------------
00120a76 <__atomic_fetch_add_4>:
120a76: 55 push %ebp
120a77: 89 e5 mov %esp,%ebp
120a79: c7 45 10 05 00 00 00 movl $0x5,0x10(%ebp)
120a80: 5d pop %ebp
120a81: eb f3 jmp 120a76 <__atomic_fetch_add_4>
00120a83 <__atomic_add_fetch_4>:
120a83: 55 push %ebp
120a84: 89 e5 mov %esp,%ebp
120a86: c7 45 10 05 00 00 00 movl $0x5,0x10(%ebp)
120a8d: 5d pop %ebp
120a8e: eb e6 jmp 120a76 <__atomic_fetch_add_4>
00120a90 <__atomic_fetch_or_4>:
120a90: 55 push %ebp
120a91: 89 e5 mov %esp,%ebp
120a93: 56 push %esi
120a94: 53 push %ebx
120a95: 83 ec 0c sub $0xc,%esp
120a98: 8b 5d 08 mov 0x8(%ebp),%ebx
120a9b: 53 push %ebx
120a9c: e8 df 66 00 00 call 127180 <_Libatomic_Protect_start>
120aa1: 8b 33 mov (%ebx),%esi
120aa3: 8b 55 0c mov 0xc(%ebp),%edx
120aa6: 09 f2 or %esi,%edx
120aa8: 89 13 mov %edx,(%ebx)
120aaa: 5a pop %edx
120aab: 59 pop %ecx
120aac: 50 push %eax
120aad: 53 push %ebx
120aae: e8 ed 66 00 00 call 1271a0 <_Libatomic_Protect_end>
120ab3: 8d 65 f8 lea -0x8(%ebp),%esp
120ab6: 89 f0 mov %esi,%eax
120ab8: 5b pop %ebx
120ab9: 5e pop %esi
120aba: 5d pop %ebp
120abb: c3 ret
------------------------------------------------
_Libatomic_Protect_start is provided by RTEMS.
------------------------------------------------
00127180 <_Libatomic_Protect_start>:
__uint32_t _Libatomic_Protect_start( void *ptr )
{
ISR_Level isr_level;
(void) ptr;
_ISR_Local_disable( isr_level );
127180: 9c pushf
127181: fa cli
127182: 58 pop %eax
static inline bool _CPU_atomic_Flag_test_and_set( CPU_atomic_Flag *obj, CPU_atomic_Order order )
{
#if defined(_RTEMS_SCORE_CPUSTDATOMIC_USE_ATOMIC)
return obj->test_and_set( order );
#elif defined(_RTEMS_SCORE_CPUSTDATOMIC_USE_STDATOMIC)
return atomic_flag_test_and_set_explicit( obj, order );
127183: b1 01 mov $0x1,%cl
127185: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
127189: 8d bc 27 00 00 00 00 lea 0x0(%edi,%eiz,1),%edi
127190: 88 ca mov %cl,%dl
127192: 86 15 8c 85 13 00 xchg %dl,0x13858c
#if defined(RTEMS_SMP)
while (
127198: 84 d2 test %dl,%dl
12719a: 75 f4 jne 127190 <_Libatomic_Protect_start+0x10>
------------------------------------------------
When I check actual GCC repository code and compare
https://gcc.gnu.org/viewcvs/gcc/trunk/libatomic/fadd_n.c?revision=232055&view=markup
https://gcc.gnu.org/viewcvs/gcc/trunk/libatomic/fior_n.c?revision=187018&view=markup
then I would interpret atomic add as enforcing use of GCC generated code
through forcing HAVE_ATOMIC_FETCH_OP_4 which I expect is solved by gcc
as call for helper __atomic_fetch_add_4 because i386 has no guaranteed
atomic add opcode ??? lock does not work there ???. Tail recursion optimization
changes call to the jump.
So it seems to be strange from original 2012 code version.
Have you any idea?
try it with official RSB build toolchain.
I would update tools when find more time.
Best wishes,
Pavel
More information about the devel
mailing list