TMS570 and vectors forwarding to application from bootloader

Sun Nov 8 22:12:20 UTC 2015

Many user cases requires that device firmware can
be updated in applications without need to open
a (often sealed) control unit.

Generic foreford
------------------------------------------------
Update directly from application by reading new image
to RAM and then running some Flash program code from RAM
is quite risky. If update fails device is lost.
Other problem is that enough ram has to be available
to store application. Other option (used on nRF51 for example)
is to store new application to other part of Flash and
if image is correct to copy it over final application
location.

But generally startup code/loader which starts firsts,
runs functional tests of core HW and checks integrity of
main application in memory and transfers execution
to its start is better option. Even if application
update fails, boot code can provide mechanism to receive
application from some communication interface and
write it to memory.

Linking of application to different startup address
is usually easy. But exception vectors or interrupt
entry points addresses has to agree between hardware
and application image. If CPU architecture provides
mechanisms to select exception table startup address
(m68k VBAR, Cortex-M VTOR register) then achieving
the match between HW and SW is easy.
------------------------------------------------

TMS570 family is based on Cortex-R4 CPU core which
support only two alternative primary exception
table start addresses. So the agreement between
SW located after loader and target addresses executed
during exception by HW is much more complicated.

There is list of possible solutions to (re)target
exceptions to right addresses in application which
is not primary boot code on TMS570LS3137:

1) reserve area in RAM which holds exception table
   ===============================================

Architecture direct execution to addresses
0x4, 0x8 ... 0x1C for different exception causes.
RAM can hold words with target address for each of
cause. But direct loading of address from SRAM
at address 0x08000000 ... is not possible by single
ldr pc, [#0x0....] or ldr pc, [pc, #0x...] isntructions
because far offset does not fit to 32-bit instruction
encoding.

Even code which loads address of vector location
to register and then uses ldr pc, [rX] is problematic,
because only automatically available shadow registers
on ARM are lr and sp, but lr contains return address
required for exit from exception processing and sp
is usually preset to location of stack reserved for
given mode. Code to forward execution to address
specified by RAM would be quite complex.

That is why usual solution of this problem on RAM
is to fill exception table in flash by instructions

  ldr pc, [pc, #0x18]

followed by target addresses in RAM (0x08000004,
0x08000008, ...). Execution continues in RAM
where another set of instructions dr pc, [pc, #0x18]
is located followed by actual exception handles service
functions entry addresses. Such setup allows runtime
vectors update, use of different set in bootloader
and application.

There are some drawbacks as well. Vectors are not protected
against rewrite by misbehaving application and two indirect
jumps provide additional latency to interrupt service.

There is another more serious obstacle to use this approach
on TMS570 Safety MCU family. Internal SRAM trampolines
are incompatible with check of TCRAM1 ECC error detection
logic during startup. This check intentionally generates
situation where internal SRAM ECC logic signals data
abort to check that HW fault detection is functional.
If this data abort leads to jump to trampoline in SRAM
under test then it leads to another abort and CPU is locked.
This check is included in standard startup sequence generated
by Ti's HalCoGen.

Additionally my colleagues implementing firmware loader
over XCP for automotive applications have some objections
against jump into SRAM as well.

=> possible but has problems

 2) Use alternative Cortex-A/Cortex-R exception table location
    ==========================================================
When bit V in Cortex-R architectural SCTLR control register
is set then CPU uses alternative location for exception table
ad addresses 0xFFFF0000 ... 0xFFFF001C. Perfect for MMU equipped
chips which can remap some RAM addresses to this area which
is usually above limit for user space range and belongs to
common kernel mapping. But on TMS570 range fails to PMM peripheral
registers

=> no option.

3) Bit BMMCR.MEMSW to swap internal SRAN and Flash regions
   =======================================================

Bus Matrix Module Control Register (BMMCR) contains bit
which allows to swap Flash and SRAM regions.
When bit is set, SRAM start at address is changed to 0
and Flash to 0x08000000 after nest soft reset.
This CPU soft reset can be initiated by bit in CPURSTCR
register.

One of my colleges has played with such setup but if
it is used even in later parts of bootloader then there
he encountered problems with Ti provided Flash programming
binary only library. If loader runs in default mapping
then it needs to translate addresses which included
in programming request to actual addresses used to store
application into Flash. Application has to be linked to
address 0x08000000 + loader size (for example 0x08040000),
but it is programmed to 0x00040000.

Ti uses this setup in their loader to start application
so it is probably well tested solution.

=> possible but quite complex

4) Exception table bypass for interrupts supported by VIM
   ======================================================

When bit VE in Cortex-R architectural control register SCTLR
is set to one then there is no instruction fetch at start
of peripheral/external interrupt processing and exception
handler entry points is set to PC directly by VIM (Vectored
Interrupts Manager). This simplifies and speeds up directing
of exectution to right service routine but RTEMS requires that
all interrupt are serviced through common _ARMV4_Exception_interrupt
handler to ensure that task switching on exit from interrupt works
correctly if need resched is set during interrupt processing.
That means that vectors stored in VIM RAM cannot point directly
to individual peripherals vector handlers and need to point
back to single entry path. But if TMS570_VIM.IRQINDEX is then
used to source identification to target execution to corresponding
service then for some peripherals (EMAC for example) interrupt
is already acknowledged by VIM and IRQINDEX is read as zero
which leads to spurious interrupt and peripheral not serviced/blocked.

We have found some workaround solution with individual
trampolines for problematic peripherals/vectors which set
some global variable first and then target execution
to regular RTEMS _ARMV4_Exception_interrupt but that
retrieves stored value. But such solution has program
space and execution time overhead and to use it for all
vectors (as it is  usually done on x86 for example)
would be quite ugly.

Interrupt vectors bypassing does not solve setup of other
first elevel exception tables entries. Undefined instruction etc.
has to be services directly by loader. On the other hand, RTEMS
thanks to resolving resched in common IRQ handler path,
doe not require SVC or any other vector for ins normal operation
(in contrast to other embedded RT executives).

=> partial solution only, fragile and with know problems

5) Use of POM to map part of SRAM over standard exception table
   ============================================================

Parameter Overlay Module (POM) allows to replace area of Flash
by some other memory source. It is intended for parameters tuning
during development. Smallest remapped area is 64 bytes, 256 kB
is upper limit.

RTEMS standard exception processing for ARMv4 and above uses
standard sequence of ldr pc, [pc,#0x18] directly followed by
target locations. This represents 32 + 32 bytes and can be
easily remapped to address 0.

We liked that option because it allows us to use any
of our other project applications to setup SDRAM and
load RTEMS image to prepared system then by OpenOCD
and start and debug application without need to care
what is content of start of the Flash.

Problem is that it works only sometimes. The POM is not intended
to fullfill instruction fetches. Our explanation is that
depending on pressure on Flash speed up buffers (cache) there
is sometimes collision between previous code execution,
exception ldr instruction fetch and following target address
read. Result is unpredictable CPU lock after some random time.

=> no option.

6) Use of POM to remap exception handlers target addresses only
   ============================================================

If the content of Flash start is well defined, filled by sequence
of
  ldr pc,[pc,#0x58]

in our case then remap of area 0x40 to 0x7f allows to target
execution to right exception handlers. RTEMS image starts
with exactly matching set of constants after ldr pc, [pc,#0x18]
sequence so simple copy of

   bsp_start_vector_table_begin  ... bsp_start_vector_table_begin + 0x40

to SRAM and then its remap to address 0x40 works.

This solution is quite legant, does not require execution of code from SRAM,
latency is minimal, only one jump instruction which would be there
even in case of RTEMS image starting at address 0. There is no need
translate addresses between image and Flash location during programming
etc.

But POM is not initially intended for this use as well so this use
can have consequences in losing safety advantages and guarantees
of CPU system.

Disadvantage is that startup code of loader in Flash has to
use matching instruction for vectors at addresses 0x4, 0x8 .. 0x1c
Code which we use for now is compatible with rest of Ti generated
code. Source can be seen there

https://github.com/hornmich/tms570ls3137-hdk-sdram/blob/master/SDRAM_SCI_configuration/source/sys_intvecs.asm

=> this is solution which we use for now and it is reasonable
   at least for applications debugging