[GSoC 2014] Paravirtualization layer in RTEMS

Youren Shen shenyouren at gmail.com
Tue Mar 11 11:53:34 UTC 2014


On Tue, Mar 11, 2014 at 3:46 PM, Philipp Eppelt <
philipp.eppelt at mailbox.tu-dresden.de> wrote:

> On 03/11/2014 01:28 AM, Gedare Bloom wrote:
> > On Mon, Mar 10, 2014 at 6:48 PM, Philipp Eppelt
> > <philipp.eppelt at mailbox.tu-dresden.de> wrote:
> >> On 03/10/2014 04:24 PM, Youren Shen wrote:
> >>> What make me confused is the relation
> >>> between pok_arch_event_register and pok_meta_handler_init. It seems you
> >>> divided the irq vector to two parts in pok_arch_event_register, Less 32
> >>> or more than 32. It looks like you have already design some hypercall
> >>> interface. (just like pok_irq_prologue_0 for clock?)  But what's
> >>> the meaning of pok_meta_handler_init? I still can't understand it very
> >>> clearly.Could you give me some outline about IRQ handlind in POK which
> >>> invoke this two functions?
> >>>
> >>> If you can provide me a brief overview about the way how you consider
> >>> this Issues and a brief description about your design,  it will be
> >>> really helpful to me.
> >>
> >> There are 16 (0 - 15)  interrupt lines for hardware interrupts on x86.
> >> If a line is triggered, the PIC will send an interrupt to the CPU.
> >> If interrupts are enabled the CPU will ask for the interrupt number and
> >> looks up this number in the Interrupt Descriptor Table (IDT).
> >> The IDT for HW interrupts looks like this:
> >> 32 | clock  ISR (Interrupt Service Routine)
> >> 33 | keyboard ISR
> >> 34 | ...
> >> ...
> >> 47 | ...
> >>
> >> INTEL reserved the first 32 (0-31) IRQ lines, so we start at 32 and go
> >> to 47. 32 corresponds to IRQ line 0, which is the clock interrupt. 33,
> >> is 1 is the keyboard (if I can trust my memory).
> >>
> >> Now the CPU never tells you which IRQ line fires. Therefore, we register
> >> the prologue functions with the IDT, which knows its line number, pushes
> >> it on the stack and calls a general ISR handler.
> >> This general ISR handler checks the line number and calls the handler
> >> registered for this line. Therefore the general ISR handler maintains
> >> its own IDT, a software IDT.
> >> This enables us to register more than one ISR handler function for one
> >> interrupt line. For example, to handle the clock tick in the kernel and
> >> tell the guest system(RTEMS) running in a partition, that a clock tick
> >> occurred (two handlers).
> >>
> >> But, we don't want the POK kernel to wait until the partition handled
> >> the interrupt.  So we acknowledge the interrupt with the PIC and then
> >> send the partition the soft-interrupt. And here we go from kernel to
> >> user space and this is the point, where I left of.
> >>
> >> To be more specific in terms of source code.
> >> 'pok_arch_event_register' is called, if you want to register any kind of
> >> interrupt with the IDT. If this happens to be in the hardware interrupt
> >> range [32-47], it registers a prologue handler with the IDT.
> >>
> >> all pok_irq_prologue functions call _ISR_Handler, which in turn calls
> >> _C_isr_handler. This is the general handler, first the asm part and
> >> second the C part.
> >> The _C_isr_handler  checks if the kernel has registered a handler for
> >> this IRQ number and calls it.
> >> Then it checks if the current partition has interrupts enabled, if yes,
> >> if there is a handler registered and if the partition isn't already
> >> servicing an earlier interrupt.
> >> If so, the registered handler is invoked.
> >>
> >> If I am talking about 'registered handler' I am talking about the
> >> software IDT the kernel is maintaining.
> >> The software IDT for hardware interrupts is a static table consisting of
> >> 16 entries of the type 'meta_handler'.
> >> 'meta_handler' is a struct consisting of a vector number, and two tables
> >> of the size "kernel + configured number of partitions".
> >> The first table is for function pointers pointing to the
> >> partition's/kernel's hander function, the
> >> what-to-do-if-IRQ-occurrs-function.
> >> The second table flags if the partition is ready for an interrupt.
> >>
> >> So for each interrupt entry in our software-IDT, we get a 'meta_handler'
> >> encapsulating a line number, atables with up to one handler per
> >> partition and a table if the partition is ready for interrupts.
> >>
> >> Next to this software IDT, there is a table 'partition_irq_enabled',
> >> which has one flag per partition and is the software replacement for
> >> CLI/STI.
> >>
> >> 'pok_meta_handler_init' sets up the software-IDT and fills all fields
> >> with start values (magic unused vector number, no handler present, but
> >> waiting)
> >> 'pok_partition_irq_init' sets up partition_irq_enabled table with the
> >> value for disabled (0), so initially no partition gets interrupts until
> >> it asks for them.
> >>
> >>
> >> How can partitions talk to the software-IDT?
> >> POK consists of kernel and partitions. Each partition has a libpok part.
> >> Libpok is the library that enables the partition to talk to other
> >> partitions and the kernel.
> >> An RTEMS guest has a POK partition part (libpart) and the RTEMS part.
> >> Libpart implements the communication with the POK kernel. So when RTEMS
> >> calls some virtualization layer function, the implementation present in
> >> libpart will emit a syscall to the pok kernel and pass along the IRQ
> >> callback function or it just tells to unregister, to
> >> enable/disable/acknowledge interrupts.
> >> Have a look at the virtualization layer functiosn in RTEMS's virtualpok
> >> BSP and examples/rtems-guest/ in POK.
> >> The syscall handling then forwards the request to the e.g.
> >> 'pok_bsp_irq_register_hw'.
> >>
> >>
> >>
> >> I hope that fits into your definition of 'briefly explain'. But it
> >> should give you enough background and explanation to follow the code and
> >> understand the design.
> >>
>

Yes, it's more detailed than I expected. Thank you very much. I will
understanding the code as soon as I can. Thank you.

> >> The really nasty bit happens in the '_C_isr_handler' function in
> >> x86-qemu/bsp.c.
> >> This is explained in my RTLWS'13 paper.
> > Link to paper please.
>
> https://wwwpub.zih.tu-dresden.de/~s8940405/rtlws13_rtems_in_pok_partitions.pdf
> >
> >> In short: Each IRQ entry builds a stack frame, which saves the registers
> >> values on the stack, when the interrupt occurs, so we can continue
> >> execution at the same point.
> >> To handle the IRQ in user space and to return to the point of
> >> interruption, the user space handler needs this data. So the interrupt
> >> frame is copied from the kernel stack to the user stack. Then 'iret'
> >> makes the kernel-space to user-space transition. And that's where we get
> >> a GeneralProtectionFault.
> >>
> > Can we just not use iret from the paravirtualized guest (RTEMS)?
> With kernel-space and kernel stack, I mean the POK kernel-space and
> stack. Sorry, I should have made that clear.
>
> In fact, the iret is a sensitive instruction in x86 paravirtualization. We
have to replace it with a hypercall(syscall) in RTEMS.
I will start from the holes of x86 virtualization to decided what hypercall
will be necessary.

As fot the iret General Protection Fault in POK, I need some times to
review the code.

This
> > problem reminds me of https://lkml.org/lkml/2011/12/16/460
> Interesting, I'll have a look.
>
> >
> >> Have also a look at the interrupt_middleman function in
> >> rtems-guest/hello.c. This is the user space recovery code of the stack
> >> frame.
> >>
> >>
> >> Cheers,
> >> Philipp
> >>
> >>
> >>
> >> p.s.
> >> This page has a couple of good tutorials for low level OS programming:
> >> http://www.brokenthorn.com/Resources/OSDev15.html
> >> _______________________________________________
> >> rtems-devel mailing list
> >> rtems-devel at rtems.org
> >> http://www.rtems.org/mailman/listinfo/rtems-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20140311/36124616/attachment-0001.html>


More information about the devel mailing list