Zynq and 4.11 dynamic loader

Thu Sep 10 01:59:34 UTC 2015

On 9/09/2015 11:07 am, Joel Sherrill wrote:
> 
> 
> On 9/8/2015 5:26 PM, Pavel Pisa wrote:
>> Hello Mathew,
>>
>> On Tuesday 08 of September 2015 16:26:24 Mathew Benson wrote:
>>> I understand 4.11 has dynamic loader support, but only for PowerPC and
>>> MIPS(?).  Does anybody have dynamic loading working on the Zynq ARM?
>>
>> there are sources for runtime dlopen for arm, i386, m68k,
>> powerpc, bfin, lm32, mips, sparc, h8300, m32r, moxie
>> and v850 included in RTEMS-4.11 rtems/cpukit/libdl.
>> I am not sure about status, dl01 and dl02 tests
>> are disabled only for some lm3s3749, lpc23xx and lpc2362
>> targets.
> 
> Good answer. :)
> 

Yes it is. For the ARM veneer support needs to be added. This is an
outstanding task for me to do.

Veneer support is a documented ARM process where relative branching with
a limited range is converted to long jumps with a signed 32bit range.
This issue was highlighted by the GSoC work a couple of years ago where
we ported Python to RTEMS and had libdl load it. The SPARC worked but
the ARM required these fix ups. The issue can be avoided by a compile
flag to force all branches to use long jumps but this is awkward and we
should support veneers.

The veneer change is complex because it adds an extra pass over the
object file being loaded to determine the memory needed for the 32bit
jumps. It may also places a size restriction on the object file being
loaded. If you are using incremental linking on the host you could
exceed the relative branch range if the veneers are outside the image
range. I am not sure now the GNU ld handles large incremental linked
object files.

> The sources are there but not all are working. The cpukit/configure.ac
> disables the architectures which don't pass dl01 in general.
> 
> https://git.rtems.org/rtems/tree/cpukit/configure.ac#n380
> 
> That's bfin, h8300, lm32, and v850. Each has a specific ticket number
> in a comment disabling it.
> 
> The tests do often find themselves disabled on BSPs which don't
> have enough RAM to run them. That's why some of the small ARM
> boards have it disabled at the testsuite level.
> 
>> I have tested dlopen on i386 and arm in last week
>> to check what can be done with it and to find
>> memory requirements etc. Chris and others are much
>> more competent in this topic but I present results
>> of my playing there with hope that it could be
>> usesfull for somebody else.
> 
> I honestly have no idea on memory requirements. Chris would
> be the one to answer that. Ultimately, you have to have memory
> to write the code into. :)

The memory usage is difficult to define. It depends on the way a user
sets up the application, the number of symbols exported, and media used
to hold the object files. Each object file held in memory has a overhead
above the code and data to administer the file.

At a basic level libdl loads ELF files. Using ELF object files requires
you manage the interactions between the object files. This works for a
number of existing applications such as Python where the dlopen support
is used to load self contained C modules. Dependencies between object
files have to be managed by the user. Loading object files that
reference each other is supported.

Real applications will usually be more complex and this requires more
sophisticated host side processing to keep the target processing and
memory requirements to a minimum. The rtems-ld tools can be used to
help. A simple example that highlights the issue is an application made
of 2 object files o1.o and o2.o and a library liba.a with a function
called 'f1'. If both object files reference the 'f1' how does the object
file containing 'f1' get loaded ? A more real example is code in libc
and how this is loaded without having multiple copies in the target.

Incremental linking of the object files with the library can do this and
you can use dlopen to load the resulting object file. This is a nice
simple way to split the application from a standard RTEMS kernel.

Some users will require more flexibility at run time. A user who wants
to build a framework of proven libraries needs something to pull the
application together. The RAP format is provided to help to this. This
is a work in progress. The RAP format strips the ELF format to just the
parts needed on the target and the compresses the data. It can package a
number of object files into a single file without incrementally linking.

Symbols management for the base image can be embedded into the kernel or
it can be loaded at run time. Loading at run time lets a user create a
symbol set that is just the exported symbols for a specific application.

> I can tell you that the implementation was intended to work
> on an ~25 Mhz Coldfire with 2MB RAM. So it isn't that heavy
> on resources.

I think there is more memory than this on the board you are thinking
about. The clock speed is this slow and raises an important requirement.
The run-time loaded code overhead should match a statically linked
image. An application that is full statically linked has should have a
similar run time profile to an application that is dynamically loader.
This is why we support a link editor that relocates when loading.

> 
> And the tests assume IMFS and have a tar image of the code being
> dynamically loaded. The test code is small but it has to be
> accounted for.
>  
>> I have included example application with dlopen shell command
>> test to RTEMS OMK template project
>>
>> http://rtime.felk.cvut.cz/gitweb/rtems-devel.git
>>
>> http://rtime.felk.cvut.cz/gitweb/rtems-devel.git/tree/refs/heads/master:/rtems-omk-template/appdl
>>
>>
>> Generally, next commands sequence should work for each
>> RTEMS board
>>
>> git clone git://rtime.felk.cvut.cz/rtems-devel.git
>> cd rtems-devel/rtems-omk-template/
>> # setup target, i.e. path to directory where RTEMS support
>> # for given board is isntalled
>> echo "RTEMS_MAKEFILE_PATH=/opt/rtems4.11/arm-rtems4.11/lpc17xx_ea_ram"
>> >config.target
>> # setup optional tests (dlopen is not enabled by default in my test)
>> cat >config.omk <<EOF
>> CONFIG_OC_APP_APPNET=y
>> CONFIG_OC_APP_APPNET_TELNETD=y
>> CONFIG_OC_APP_APPDL=y
>> CONFIG_OC_APP_APPDL_NET=y
>> CONFIG_OC_APP_APPDL_TELNETD=y
>> CONFIG_OC_APP_DL_PRINT=y
>> EOF
>>
>> make
>>
>> You find linked application with symbol table exported
>>
>> _compiled/lpc17xx_ea_ram/bin/appdl
>>
>> Inclusion of export table is controlled by per application
>> option applicationname_EXPORTSYMBOLS = y
>> You need rtems-sysms tool form https://git.rtems.org/rtems-tools/
>> repository installed.
>>
>> The symbol table for ARM is build in my case (see "make V=1" output)
>> as linking application the first without symbol table (appdl.prelink)
>> and then generating symbol table by rtems-syms and linking with
>> appdl-symbol-table.o included.
>>
>> Symbol table preparation command
>>
>> rtems-syms -e -c "arm-rtems4.11-gcc --pipe
>> -B/opt/rtems4.11/arm-rtems4.11/lpc17xx_ea_ram/lib/
>>    -specs bsp_specs -qrtems -march=armv7-m -mthumb
>>    -I
>> /home/pi/tmp/rtems-devel/rtems-omk-template/_compiled/lpc17xx_ea_ram/include
>>
>>    -Wall  -O2 -g  -I
>> /home/pi/tmp/rtems-devel/rtems-omk-template/_build/lpc17xx_ea_ram/user/appdl"
>>
>>    -S appdl-symbol-table.c -o appdl-symb      ol-table.o appdl.prelink
>>
>> The most form the invocation is specification of C compiler
>> and its flags.
>>
>> I have not prepared simple way to specify that some plugins,
>> shared objects should be compiled as relocatable ELFs for
>> our infrastructure yet. I probably try to define it as equivalent
>> of shared_LIBRARIES support which we have in Linux OMK variant.
>>
>> I have used hack for testing now, where I link complete application
>> image, i.e. example for my board
>>
>> _compiled/lpc17xx_ea_ram/bin-tests/appdl_print
>>
>> and copy its (single) object file to IMFS initial data location
>>
>> mkdir appdl/rootfs/bin
>> cp _build/lpc17xx_ea_ram/user/appdl/examples/appdl_print.o
>> appdl/rootfs/bin
>>
>> and rerun
>>
>> make clean all
>>
>> That way the relocatable object file can be found
>> in running shell in directory /bin and can be loaded
>>
>> ls /bin
>> dlopen /bin/appdl_print.o main arg1 arg2 arg3
>>
>> Example can include network, so loading over TFTP,
>> NFS or other RTEMS supported transports is possible
>> as well.
>>
>> There are more related things which I have not
>> solved yet and would be happy for others ideas.
>>
>> The main problem is that I would like to have
>> some generic kernel image which exports not only
>> actually used symbols but ideally all standard
>> POSIX calls and support functions as well as
>> RTEMS services.

The rtems-syms tool in the RTEMS Tools Project
(https://git.rtems.org/rtems-tools/) help generate symbols tables. You
can decide if the symbols table is embedded in the base image, ie a
double link process, or you can create an object file containing the
symbols and it can be loaded at run time.

>> One simple and naive solution, which I have tried,
>> is use of "-Xlinker --whole-archive" option
>> during appdl.prelink linking.
>> But that does not work there are more compnents
>> and device drivers which cannot be included
>> without application provided configuration tables
>> etc. I know that there has been some support
>> to build generic RTEMS kernel in past with
>> right set of symbols exported (Eric Norum, EPICS).
>> But I have newer tried that.

I think we can do better than the whole archive option. We should be
able to collect the various archive object files we need and create a
library we place on the target. During development we have the whole
archive available and for production we have only the archives we want.

>>
>> I can imagine to use NM on RTEMS and Newlib C/m
>> ibraries and select some reasonable way which
>> objects are providing generic APIs and link these
>> or ask their symbols as extrn during prelink
>> phase. But that is not much elegant either.

Please take a look at rtems-ld in the RTEMS Tools Project. It is built
using a C++ framework that support ELF file management and symbols.
There are a few tools in the linker directory to help manage applications.

>>
>> When I have tried to run dlopen more times on same
>> object then there is problem that some small amount
>> of memory is allocated each time, this means, that
>> use of dlopen directly from some scripting environment
>> without dlclose (which is risky, because called function
>> can create threads) would need some other protection
>> against multiple dlopen calls specifying same file.

This sounds like a bug and a ticket to track this would be nice.

>>
>> But generally, I have been really pleased how simple
>> id RTEMS dlopen use and how far it is compliant
>> to POSIX/more heavy OSes al Linux.
>>

Nice to hear this.

Chris