RTEMS LEON2 EDAC Trap Management

Tue Dec 1 17:07:40 UTC 2009

Hi,

I do it this way: inside the trap handler I disable traps and re-read
the address again. This causes a second trap, but it is just marked as
pending. The EDAC will return the corrected value, so I can now write
the corrected value back to memory.

Then I clear the correctable memory bit, enable traps again and
continue normal execution.

Hope it helps.

isr_handler()
{
.....
  // Correct memory atomically
  pil = interrupts_disable ();

  // Re-read address (the EDAC system will return the correct value)
  // and store it back again.
  volatile unsigned int *address = (unsigned int *) failing_address;
  unsigned int value = *address;
  *address = value;

  // Clear trap generated above
  interrupt_clear (correctable_mem_error);
  interrupts_enable (pil);

   .....
}

On Tue, Dec 1, 2009 at 17:24, Leonard Bise <leonard.bise at syderal.ch> wrote:
>
> Hi all,
>
> I'm working on an RTEMS application running on a LEON2 processor.
> I have some issues in regard to the Double EDAC error trap (0x09)
> management.
>
> I'm trying to validate that my application correctly manages EDAC errors by
> launching my application, breaking then generating an EDAC error and then
> resuming.
> My application then detects that an error has hapened and launches the
> correct trap handler.
>
> After the trap has been handled, we resume normal execution by executing the
> rett instruction (return from trap), with the address of the next
> instruction after the one that has been trapped.
> Only that instead of continuing executing after the rett, the same
> instruction that triggered the first error triggers another error (0x11)
> which should be a correctable EDAC error.
>
> This behavior then loops forever (Double EDAC than Single EDAC etc...) and
> after some time my application resets (I have approximatively 32'000 trap
> triggered for each one before it resets).
>
> I've been looking around for help in the LEON "community" but I could not
> find much help. I know it is not necessarily RTEMS related but a colleague
> of mine which does the exact same thing on another project but only in C has
> no problem so I'm wondering what might be causing this.
>
> For info here is my double edac trap handler.
>
> static void BGD_trap_DE_handler(void) {
>     {
>         /*#[ operation BGD_trap_DE_handler() */
>         volatile uint32*          failing_address;
>         boolean accepted = FALSE;
>
>         /* read Failure Address Register */
>         asm("TEST_EDAC_DOUBLE_START:nop;");
>         failing_address = *FAILAR;
>         FIFO_edac_address [FIFO_edac_next] = (uint32)failing_address;
>         FIFO_edac_double [FIFO_edac_next] = 1;
>         FIFO_edac_nb = (FIFO_edac_nb & 0xF) + 1;
>         /* Note: in case more than 16 errors occur during 1 second, the 16
> oldest will be lost. */
>         FIFO_edac_next = (FIFO_edac_next+1) & 0xF;
>
>         /* SYD MOD */
>         if ((failing_address < START_EEPROM) || (failing_address >=
> 0x80000000))
>         {
>             BGD_process_d_edac_sram();
>         }
>         else
>         {
>             BGD_process_d_edac_eeprom();
>         }
>         /* reset fail status register */
>         *FAILSR = 0;
>         *FAILAR = 0;
>
>         //BDT_abort_request (&accepted);
>         /*#]*/
>                 asm("TEST_EDAC_DOUBLE_END:nop;");
>     }
> }
>
> Here is a disassembly of the instruction that triggers the  Double EDAC
> Trap, which is correct:
>  1046115464  40001400  cmp  %i0, 15                [00000ff1]
>  1046115466  40001404  bgu  0x40001414             [00000000]
>  1046115467  40001408  nop                         [00000000]
>  1046115468  40001414  add  %fp, -20, %i5          [401e5f34]
>  1046115473            ahb read,  mst=0, size=2    [401e5f34 40053854]
>  1046115474  40001418  ld  [%i5], %i2              [40053854]
>  1046115475  4000141c  add  %fp, -24, %i4          [401e5f30]
>  1046115480            ahb read,  mst=0, size=2    [401e5f30 40100000]
>  1046115481  40001420  ld  [%i4], %i1              [40100000]
>  1046115482  40001424  mov  %i2, %i3               [40053854]
>  1046115483  40001428  mov  %i1, %i0               [40100000]
>  1046115491            ahb read,  mst=0, size=2    [40100000 a6102003]
>  1046115492  4000142c  ld  [%i0], %i0              [trapped]
>
> Here is the second trap triggered, which should not happen:
>  1046117903  4002bdc8  mov  %l0, %psr              [000000c4]
>  1046117904  4002bdcc  nop                         [00000000]
>  1046117905  4002bdd0  nop                         [00000000]
>  1046117906  4002bdd4  nop                         [00000000]
>  1046117911            ahb read,  mst=0, size=2    [401e5e84 00000000]
>  1046117912  4002bdd8  ld  [%g1 + 0x6c], %g1       [00000000]
>  1046117913  4002bddc  jmp  %l1                    [4002bddc]
>  1046117914  4002bde0  rett  %l2                   [40001430]
>  1046117916  4000142c  ld  [%i0], %i0              [trapped]
>
> I hope someone can help ;)
>
> Léonard.
>
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.org
> http://www.rtems.org/mailman/listinfo/rtems-users
>
>