Memory Protection project interface details (GSoC 2020)

Wed May 13 16:56:07 UTC 2020

On Wed, May 13, 2020 at 9:41 AM Gedare Bloom <gedare at rtems.org> wrote:

> On Wed, May 13, 2020 at 6:12 AM Utkarsh Rai <utkarsh.rai60 at gmail.com>
> wrote:
> >
> >
> >
> > On Wed, May 13, 2020 at 3:56 AM Joel Sherrill <joel at rtems.org> wrote:
> >>
> >> I hate to top post but I'm not sure where to insert this.
> >>
> >> As a first step, I would ensure that paging catches stack overflow
> errors.
> >
> >
> > I understand that any thread-stack protection mechanism has to ensure
> against a stack overflow error, adding this to my plan would mean that I
> have to re-adjust my goals, one of them being support for typed-memory
> objects. I would request your opinion as to what features are more valuable
> to the community so I can focus on them.
> >
>
> When Joel says stack overflow he means a thread writing beyond the
> thread's stack area. I don't think this would require TYM? Usually the
> way this is done is to add a gap between stacks when you allocate
> them, which is a bit wasteful of physical memory.
>

A gap in the virtual address?  I guess if you map virtual:physical 1:1,
then it is a gap in both. But technically, it doesn't have to be 1:1.

Independent of being pedantic on that, enlighten me how using the MMU
for stack overflow protection isn't a logical step on the path of this
project?
It is just a mapping thread stacks and letting every thread have access to
all threads memory but leaving gaps.

I have seen the basic benefits of an MMU as follows (increasing in
complexity and dynamicism):

+ mark region of memory as read/write, execute, read only, no cache,
   and for stack use only. What you can do depends on the architecture.
  This is a gross form of memory protection using large areas of memory.
   This is usually done at BSP startup and left along.

+ Provide each thread a protected area where overflow/underflow is
  detected by the MMU. This requires finer grain control but if still simple
   since it is only changed at run-time by the thread stack
allocator/deallocator.

+ Finer grain control of which threads can access which other threads'
stack.
   This will require more management and something happens at context
switch.

+ Very fine grain control of objects like CHERI and other similar
approaches.

This project is aiming for the third. Wouldn't it need to move through the
first
two capabilities?

>
> >> Then move on to an optional capability where threads cannot access the
> stacks of other threads. POSIX does not say anything about whether that
> should work or not but there are cases (especially in RTEMS) where if a
> blocking thread has a message queue buffer on a stack and another thread
> does a send, then it will write to another thread's stack.
> >
> >
> > One of the most important parts of this project will be proper
> documentation to make the user understand the possible cases where enabling
> this feature will cause problems and proper handling of these issues(For
> all the cases the users need stack sharing, they will have to explicitly
> make certain calls to share stacks )
> >>
> >> This is accepted programming practice.
> >
> >
> >>
> >>
> >> I'm not opposed to an option where per-thread stack protection is
> available. But making it mandatory is a bad step. Using the MMU to detect
> stack overflow is good.
> >
> >
> > Yes, the thread-stack protection will be user-configurable.
> >>
> >>
> >> As Hesham mentioned, this is hard on some architectures if you don't
> have a nice page management system. Can you make sure the minimum processor
> architectural requirements are documented. Not just in an email. This will
> ultimately be information in the CPU Supplement.
> >
> > Noted.
> >>
> >>
> >> --joel
> >>
> >> On Tue, May 12, 2020 at 5:02 PM Hesham Almatary <
> hesham.almatary at cl.cam.ac.uk> wrote:
> >>>
> >>> On Tue, 12 May 2020 at 04:57, Gedare Bloom <gedare at rtems.org> wrote:
> >>> >
> >>> > On Thu, May 7, 2020 at 9:59 PM Hesham Almatary
> >>> > <hesham.almatary at cl.cam.ac.uk> wrote:
> >>> > >
> >>> > > Hello Utkarsh,
> >>> > >
> >>> > > I'd suggest you don't spend too much efforts on setting up BBB
> >>> > > hardware if you haven't already. Debugging on QEMU with GDB is way
> >>> > > easier, and you can consider either qemu-xilinx-zynq-a9 or rpi2
> BSPs.
> >>> > > Later, you can move your code to BBB if you want, since both are
> based
> >>> > > on ARMv7.
> >>> > +1
> >>> >
> >>> > Past work has also used psim successfully I thought? Or am I
> mistaken there.
> >>> >
> >>> Before my 2012 project (and part of it, yes), we used psim. The use of
> >>> software TLBs wasn't very ideal/easy though, so we moved to ARM in
> >>> 2013. The development/testing was mainly on a RPi board. I don't
> >>> remember there was a QEMU model for it yet.
> >>>
> >>>
> >>> > >
> >>> > > On Thu, 7 May 2020 at 18:26, Utkarsh Rai <utkarsh.rai60 at gmail.com>
> wrote:
> >>> > > >
> >>> > > > Hello,
> >>> > > > This is to ensure that all the interested parties are on the
> same page before I start my project and can give their invaluable feedback.
> >>> > Excellent, thank you for getting the initiative.
> >>> >
> >>> > I'll be taking on the primary mentorship for your project, with
> >>> > support from the co-mentors (Peter, Hesham, Sebastian). For now, I
> >>> > prefer you to continue to make your presence on the mailing list. We
> >>> > will establish other forms of communication as needed and will take
> on
> >>> > IRC meetings once coding begins in earnest.
> >>> >
> >>> > > > My GSoC project, providing user-configurable thread stack
> protection, requires adding architecture-specific low-level support as well
> as high-level API support. I will be starting my project with ARMv7-A (on
> BBB) based MMU since RTEMS already has quite mature support for it. As
> already mentioned in my proposal I will be focusing more on the High-level
> interface and let it drive whatever further low-level support is needed.
> >>> > > > Once the application uses MMU for thread stack address
> generation each thread will be automatically protected as the page tables
> other than that of the executing thread would be made dormant. When the
> user has to share thread stacks they will have to obtain the stack
> attributes of the threads to be shared by pthread_attr_getstack() and then
> get a file descriptor of the memory to be mapped by a call to shm_open()
> and finally map this to the stack of the other thread through
> >>> > > > mmap(), this is the POSIX compliant way I could think of. Now at
> the low level, it means mapping the page table of the thread to be shared
> into the address space of the executing thread. This is an area where the
> low-level support has to be provided. At the high-level, this means
> providing support to mmap and shared-memory interface as mmap provides
> support for a file by simply
> >>> > > > copying the memory from the file to the destination. For shared
> memory objects it can
> >>> > > > provide read/write access but cannot provide restriction of
> write/read access. One of the areas that I have to look into more detail is
> thread context-switch, as after every context switch the TLBs need to be
> flushed and reinitialized lest we get an invalid address for the executing
> thread. Since context-switch is low-level architecture-specific, this also
> has to be provided with more support.
> >>> >
> >>> > This is really dense text. Try to break apart your writing a little
> >>> > bit to help clarify your thoughts.  You should also translate some of
> >>> > your proposal into a wiki page if you haven't started that yet, and a
> >>> > blog post. Both of those will help to focus your thoughts into words.
> >>> >
> >>> > "mapping the page table" is not meaningful to me. I think you mean
> >>> > something like "mapping a page from the page table"?  Will the design
> >>> > support sharing task stacks using MPUs with 4 regions? 8?  (It seems
> >>> > challenging to me, but might be possible in some limited
> >>> > configurations. Having support for those kinds of targets might still
> >>> > be useful, with the caveat that sharing stacks is not possible.)
> >>> >
> >>> > The first step is to get a BSP running that has sufficient
> >>> > capabilities for you to test out memory protection with. Do a little
> >>> > bit of digging, but definitely simulation is the way to go.
> >>> >
> >>> > The second step from my perspective is to determine how to introduce
> >>> > strict isolation between task stacks. Don't worry about sharing at
> >>> > this stage, but rather can you completely isolate tasks? Then you can
> >>> > start to poke holes in the isolation.
> >>> >
> >>> > As you say, you'll also need to start to understand the context
> switch
> >>> > code. Start looking into it to determine where you might think to
> >>> > implement changing the address space of the executing thread. Another
> >>> > challenge is that RTEMS can dispatch to a new task from the interrupt
> >>> > handler, which may cause some problems for you as well to handle.
> >>> >
> >>> > Have you figured out where in the code thread stacks are allocated?
> >>> > How do you plan to make the thread stacks known to other threads?
> >>> >
> >>> > TLB shootdown can be extremely expensive. Try to find ways to
> optimize
> >>> > that cost earlier rather than later. (One of those cases where
> >>> > premature optimization will be acceptable.) Tagged TLB architectures
> >>> > or those with "superpages" may incur less overhead if you can
> >>> > selectively shoot-down the entry (entries) used for task stacks.
> >>> >
> >>> > A final thought is that the method to configure this support is
> >>> > necessary. Configuration is undergoing some heavy changes lately, and
> >>> > application-level configuration is going to be completely different
> in
> >>> > rtems6. You may want to consider raising a new thread with CC to
> >>> > Sebastian to get his input on how the best way to configure something
> >>> > like this might look, now and in the future. I would have leaned
> >>> > toward a high-level configure switch (--enable-task-protection) in
> the
> >>> > past, but now I don't know.  This capability is however something
> that
> >>> > should be considered disabled by default due to the extra overhead.
> >>> >
> >>> > Gedare
> >>> >
> >>> > > > Kindly provide your feedback if I have missed something or I
> have a wrong idea about it.
> >>> > > >
> >>> > > > Regards,
> >>> > > > Utkarsh Rai.
> >>> > > >
> >>> > > > _______________________________________________
> >>> > > > devel mailing list
> >>> > > > devel at rtems.org
> >>> > > > http://lists.rtems.org/mailman/listinfo/devel
> >>> > > _______________________________________________
> >>> > > devel mailing list
> >>> > > devel at rtems.org
> >>> > > http://lists.rtems.org/mailman/listinfo/devel
> >>> > _______________________________________________
> >>> > devel mailing list
> >>> > devel at rtems.org
> >>> > http://lists.rtems.org/mailman/listinfo/devel
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel at rtems.org
> >>> http://lists.rtems.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20200513/508433cf/attachment-0001.html>