[PATCH v1 2/5] cpukit: Add Exception Manager

Tue Aug 31 09:31:29 UTC 2021

On 30/08/2021 17:13, Kinsey Moore wrote:
> On 8/30/2021 07:50, Sebastian Huber wrote:
>> On 30/08/2021 14:27, Kinsey Moore wrote:
>>> On 8/30/2021 00:42, Sebastian Huber wrote:
>>>> Hello Kinsey,
>>>>
>>>> why can't you use the existing fatal error extension for this? You 
>>>> just have to test for an RTEMS_FATAL_SOURCE_EXTENSION source.  The 
>>>> fatal code is a pointer to the exception frame.
>>>
>>> Unfortunately, the fatal error extensions framework necessarily 
>>> assumes that the exception is fatal and so does not include the 
>>> machinery to perform a thread dispatch or restore the exception frame 
>>> for additional execution. It could theoretically be done in the fatal 
>>> error extensions context, but it would end up being reimplemented for 
>>> every architecture and you'd have to unwind the stack manually. I'm 
>>> sure there are other ragged edges that would have to be smoothed over 
>>> as well.
>>
>> Non-interrupt exceptions are not uniformly handled across 
>> architectures in RTEMS currently. Adding the 
>> RTEMS_FATAL_SOURCE_EXTENSION fatal source was an attempt to do this. I 
>> am not that fond of adding a second approach unless there are strong 
>> technical reasons to do this.
> This was in an effort to formalize how recoverable exceptions are 
> handled. Currently, it's done on on SPARC by handling exception traps as 
> you would an interrupt trap since they share a common architecture on 
> that platform. This representation varies quite a bit among platforms, 
> so we needed a different mechanism.

I recently changed the non-interrupt exception handling on sparc, since 
it was not robust against a corrupt stack pointer:

http://devel.rtems.org/ticket/4459

>>
>> The initial fatal extensions are quite robust, you only need a stack, 
>> valid read-only data and a valid code. So, using a user extension is 
>> the right thing to do, but I don't thing we need a new one.
>>
>> Doing the non-interrupt exception processing on the stack which caused 
>> the exception is a bit problematic, since the stack pointer might be 
>> corrupt as well. It is more robust to switch to for example the 
>> interrupt stack. If the exception was caused by an interrupt, then 
>> this exception is not recoverable.
> 
> The non-interrupt exception processing occurs on the interrupt stack, 
> not the thread/user stack. In the AArch64 support code provided, the 
> stack is switched back to the thread/user stack before thread dispatch 
> and exception frame restoration occurs.

You can only switch back to the thread stack if it is valid. Doing a 
thread dispatch should be only done if you are sure that the system 
state is still intact. This is probably no the case for most exceptions.

> 
> 
>>
>> If the non-interrupt exception was caused by a thread, then you could 
>> do some high level actions for some exceptions, such as floating-point 
>> exceptions and arithmetic exceptions. If you get a data abort or 
>> instruction error, then it is probably better to terminate the system.
> I leave that decision to the handlers defined on this framework. In the 
> case of the exception-to-signal mapping, I'm carrying over the existing 
> exception set from the SPARC implementation.

It is probably this code:

+    case EXCEPTION_DATA_ABORT_READ:
+    case EXCEPTION_DATA_ABORT_WRITE:
+    case EXCEPTION_DATA_ABORT_UNSPECIFIED:
+    case EXCEPTION_INSTRUCTION_ABORT:
+    case EXCEPTION_MMU_UNSPECIFIED:
+    case EXCEPTION_ACCESS_ALIGNMENT:
+      signal = SIGSEGV;
+      break;
+
+    default:
+      /*
+       * Covers unknown, PC/SP alignment, illegal execution state, and 
any new
+       * exception classes that get added.
+       */
+      signal = SIGILL;
+      break;
+  }

Using signals to handle these exceptions is like playing Russian roulette.

>>
>> Non-interrupt exception handling is always architecture-dependent. It 
>> is just a matter how you organize it. In general, the most sensible 
>> way to deal with non-interrupt exceptions is to log the error somehow 
>> and terminate the system. The mapping to signals is a bit of a special 
>> case if you ask me. My preferred way to handle non-interrupt 
>> exceptions would be to
>>
>> 1. switch to a dedicated stack
>>
>> 2. save the complete register set to the CPU exception frame
>>
>> 3. call the fatal error extensions with RTEMS_FATAL_SOURCE_EXTENSION 
>> and the CPU exception frame (with interrupts disabled)
>>
>> Add a new API to query/alter the CPU exception frame, switch to the 
>> stack indicated by the CPU exception frame, and restore the context 
>> stored in the CPU exception frame. With these architecture-dependent 
>> CPU exception frame support it should be possible to implement a high 
>> level mapper to signals.
>>
> What you've described is basically what is happening here (the dedicated 
> stack is currently the interrupt/exception stack on AArch64), but the 
> low level details are necessarily contained within the CPU port in patch 
> 3/5. Support for this framework is not required for any CPU port, but 
> CPU ports that do support it repurpose the existing code underlying the 
> fatal error extensions with the additional support you described above. 

I don't think that looking at existing code is the right thing to do. 
The exception handling is too diverse in RTEMS. We should think about 
how a clean design should look like.

> This does not exist in parallel to the fatal error extensions, but 
> rather the fatal error extensions are moved on top of the Exception 
> Manager for CPU ports that support it. The Exception Manager returns 
> whether the exception was handled and the CPU port then calls the fatal 
> error extensions if the exception wasn't handled. With this patch set, 
> only an accessor was added to get the exception class, but my initial 
> thoughts included manipulation of the execution address and several 
> other more generic manipulators.

If a non-interrupt exception occurs, the default behaviour should be to 
terminate the system as robust and save as possible. Raising signal 
should be optional and not make the exception handling less robust. The 
support for the signals should also not lead to dead code in the default 
case. This is why I proposed a two step approach. The first step is a 
normal fatal error handler. The second step is a resume of normal 
multitasking in a special signal fatal error extension using an 
architecture-specific "jump" which is defined by the CPU exception frame.

-- 
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.huber at embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/