Help on how to configure for user-defined memory protection support (GSoC 2020)

Wed May 27 00:12:22 UTC 2020

On Mon, May 25, 2020 at 9:32 PM Gedare Bloom <gedare at rtems.org> wrote:

> On Mon, May 25, 2020 at 5:39 AM Utkarsh Rai <utkarsh.rai60 at gmail.com>
> wrote:
> >
> >
> > On Fri, May 22, 2020, at 10:59 AM Gedare Bloom <gedare at rtems.org> wrote:
> >>
> >> >  This means that our low-level design for providing thread stack
> protection may look something like this:-
> >> >
> >> > 1. For MPU based processors, the number of protected stacks will
> depend on the number of protection domains i.e. for MPUs with 8 protection
> domains we can have 7 protected stacks ( 1 of the region will be assigned
> for global data). For MMU based system we will have a section (a page of
> size 1MB) for global data and task address space will be divided into
> smaller pages, page sizes will be decided by keeping in mind the number of
> TLB entries, in a manner I have described above in the thread.
> >> >
> >> There is value to defining a few of the global regions. I'll assume
> >> R/W/X permissions. Then code (.text) should be R/X. read-only data
> >> sections should be grouped together and made R. Data sections should
> >> be RW. And then stacks should be added to the end. The linker scripts
> >> should be used to group the related sections together. I think some
> >> ARM BSPs do some of this already.  That seems like a minimally useful
> >> configuration for most users that would care, they want to have also
> >> protection of code from accidental overwrite, and probably data too,
> >> and non-executable data in general. You also may have to consider a
> >> few more permission complications (shared/cacheable) depending on the
> >> hardware.
> >
> >
> > The low-level mmu implementation for ARMv7 BSPS has an
> 'ARMV7_CP15_START_DEFAULT_SECTIONS' which lists out various regions with
> appropriate permissions and then are grouped by a linker script. This
> should be the standard way of handling the placement of statically
> allocated regions.
> >
> >> >  2. The protection, size, page table, and sharing attributes of each
> created thread will be tracked.
> >> >
> >> I'd rather we not be calling this a page table. MPU-based systems
> >> don't have a notion of page table. But maybe it is OK as long as we
> >> understand that you mean the data structure responsible for mapping
> >> out the address space. I'm not sure what you mean by size, unless you
> >> refer to that thread's stack.
> >>
> >> >  3. At every context switch, these attributes will be updated, the
> static-global regions will be assigned a global ASID and will not change
> during the switch only the protected regions will be updated.
> >> >
> >> Yes, assuming the hardware supports ASIDs and a global attribute.
> >>
> >> I don't know if you will be able to pin the global entries in
> >> hardware. You'll want to keep an eye out for that. If not, you might
> >> need to do something in software to ensure they don't get evicted
> >> (e.g., touch them all before finishing a context switch assuming LRU
> >> replacement).
> >>
> >> >  4. Whenever we share stacks, the page table entries of the shared
> stack, with the access bits as specified by the mmap/shm high-level APIs
> will be installed to the current thread. This is different from simply
> providing the page table base address of the shared thread-stack ( what if
> the user wants to make the shared thread only readable from another thread
> while the 'original' thread is r/w enabled?) We will also have to update
> the TLB by installing the shared regions while the global regions remain
> untouched.
> >> >
> >>
> >> Correct. I think we need to make a design decision whether a stack can
> >> exceed one page. It will simplify things if we can assume that, but it
> >> may limit applications unnecessarily. Have to think on that.
> >
> >
> > If we go with the above assumption, we will need to increase the size of
> the page i.e. pages of 16Kib or 64Kib. Most of the applications won't
> require stacks of this size and will result in wasted memory for each
> thread. I think it would be better if we have multiple pages, as most of
> the applications will have stacks that may fit in a single 4KiB page anyway.
> >
>
> I mis-typed. I meant I think we can assume stacks fit in one page. It
> would be impossible to deal with otherwise.
>
> >>
> >> The "page table base address" points to the entire structure that maps
> >> out a thread's address space, so you'd have to walk it to find the
> >> entry/entries for its stack. So, definitely not something you'd want
> >> to do.
> >>
> >> The shm/mmap should convey the privileges to the requesting thread
> >> asking to share. This will result in adding the shared entry/entries
> >> to that thread's address space, with the appropriately set
> >> permissions. So, if the entry is created with read-only permission,
> >> then that is how the thread will be sharing. The original thread's
> >> entry should not be modified by the addition of an entry in another
> >> thread for the same memory region.
> >>
> >> I lean toward thinking it is better to always pay for the TLB miss at
> >> the context switch, which might mean synthesizing accesses to the
> >> entries that might have been evicted in case hardware restricts the
> >> ability of sw to install/manipulate TLB entries directly. That is
> >> something worth looking at more though. There is definitely a tradeoff
> >> between predictable costs and throughput performance. It might be
> >> worth implementing both approaches.
> >>
> >> Gedare
> >
> >
> > We also need to consider the cases where the stack sharing would be
> necessary-
> >
> > - We can have explicit cases where an application gets the attributes of
> a thread by pthread_attr_getstack() and then access this from another
> thread.
> >
> > -  An implicit case would be when a thread places the address of an
> object from its stack onto a message queue and we have other threads
> accessing it, in general, all blocking reads (sockets, files etc.) will
> share stacks.
> >
> > This will be documented so that the user first shares the required
> stacks and then performs the above operations.
> >
>
> Yes. It may also be worth thinking whether we can/should "relocate"
> stacks when they get shared and spare TLB entries are low. This would
> be a dynamic way to consolidate regions, while a static way would rely
> on some configuration method to declare ahead of time which stacks may
> be shared, or to require the stack allocator (hook) to manage that
> kind of complexity.
>

Sorry but I am not sure I clearly understand what you are trying to
suggest. Does relocating stacks mean moving them to the same virtual
address as the thread-stack it is being shared with but with different
ASID?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20200527/84d7b599/attachment-0001.html>