Help on how to configure for user-defined memory protection support (GSoC 2020)

Fri May 29 00:24:28 UTC 2020

On Wed, May 27, 2020 at 8:29 PM Gedare Bloom <gedare at rtems.org> wrote:

> On Tue, May 26, 2020 at 6:12 PM Utkarsh Rai <utkarsh.rai60 at gmail.com>
> wrote:
> >
> >
> >
> > On Mon, May 25, 2020 at 9:32 PM Gedare Bloom <gedare at rtems.org> wrote:
> >>
> >> On Mon, May 25, 2020 at 5:39 AM Utkarsh Rai <utkarsh.rai60 at gmail.com>
> wrote:
> >> >
> >> >
> >> > On Fri, May 22, 2020, at 10:59 AM Gedare Bloom <gedare at rtems.org>
> wrote:
> >> >>
> >> >> >  This means that our low-level design for providing thread stack
> protection may look something like this:-
> >> >> >
> >> >> > 1. For MPU based processors, the number of protected stacks will
> depend on the number of protection domains i.e. for MPUs with 8 protection
> domains we can have 7 protected stacks ( 1 of the region will be assigned
> for global data). For MMU based system we will have a section (a page of
> size 1MB) for global data and task address space will be divided into
> smaller pages, page sizes will be decided by keeping in mind the number of
> TLB entries, in a manner I have described above in the thread.
> >> >> >
> >> >> There is value to defining a few of the global regions. I'll assume
> >> >> R/W/X permissions. Then code (.text) should be R/X. read-only data
> >> >> sections should be grouped together and made R. Data sections should
> >> >> be RW. And then stacks should be added to the end. The linker scripts
> >> >> should be used to group the related sections together. I think some
> >> >> ARM BSPs do some of this already.  That seems like a minimally useful
> >> >> configuration for most users that would care, they want to have also
> >> >> protection of code from accidental overwrite, and probably data too,
> >> >> and non-executable data in general. You also may have to consider a
> >> >> few more permission complications (shared/cacheable) depending on the
> >> >> hardware.
> >> >
> >> >
> >> > The low-level mmu implementation for ARMv7 BSPS has an
> 'ARMV7_CP15_START_DEFAULT_SECTIONS' which lists out various regions with
> appropriate permissions and then are grouped by a linker script. This
> should be the standard way of handling the placement of statically
> allocated regions.
> >> >
> >> >> >  2. The protection, size, page table, and sharing attributes of
> each created thread will be tracked.
> >> >> >
> >> >> I'd rather we not be calling this a page table. MPU-based systems
> >> >> don't have a notion of page table. But maybe it is OK as long as we
> >> >> understand that you mean the data structure responsible for mapping
> >> >> out the address space. I'm not sure what you mean by size, unless you
> >> >> refer to that thread's stack.
> >> >>
> >> >> >  3. At every context switch, these attributes will be updated, the
> static-global regions will be assigned a global ASID and will not change
> during the switch only the protected regions will be updated.
> >> >> >
> >> >> Yes, assuming the hardware supports ASIDs and a global attribute.
> >> >>
> >> >> I don't know if you will be able to pin the global entries in
> >> >> hardware. You'll want to keep an eye out for that. If not, you might
> >> >> need to do something in software to ensure they don't get evicted
> >> >> (e.g., touch them all before finishing a context switch assuming LRU
> >> >> replacement).
> >> >>
> >> >> >  4. Whenever we share stacks, the page table entries of the shared
> stack, with the access bits as specified by the mmap/shm high-level APIs
> will be installed to the current thread. This is different from simply
> providing the page table base address of the shared thread-stack ( what if
> the user wants to make the shared thread only readable from another thread
> while the 'original' thread is r/w enabled?) We will also have to update
> the TLB by installing the shared regions while the global regions remain
> untouched.
> >> >> >
> >> >>
> >> >> Correct. I think we need to make a design decision whether a stack
> can
> >> >> exceed one page. It will simplify things if we can assume that, but
> it
> >> >> may limit applications unnecessarily. Have to think on that.
> >> >
> >> >
> >> > If we go with the above assumption, we will need to increase the size
> of the page i.e. pages of 16Kib or 64Kib. Most of the applications won't
> require stacks of this size and will result in wasted memory for each
> thread. I think it would be better if we have multiple pages, as most of
> the applications will have stacks that may fit in a single 4KiB page anyway.
> >> >
> >>
> >> I mis-typed. I meant I think we can assume stacks fit in one page. It
> >> would be impossible to deal with otherwise.
> >>
> >> >>
> >> >> The "page table base address" points to the entire structure that
> maps
> >> >> out a thread's address space, so you'd have to walk it to find the
> >> >> entry/entries for its stack. So, definitely not something you'd want
> >> >> to do.
> >> >>
> >> >> The shm/mmap should convey the privileges to the requesting thread
> >> >> asking to share. This will result in adding the shared entry/entries
> >> >> to that thread's address space, with the appropriately set
> >> >> permissions. So, if the entry is created with read-only permission,
> >> >> then that is how the thread will be sharing. The original thread's
> >> >> entry should not be modified by the addition of an entry in another
> >> >> thread for the same memory region.
> >> >>
> >> >> I lean toward thinking it is better to always pay for the TLB miss at
> >> >> the context switch, which might mean synthesizing accesses to the
> >> >> entries that might have been evicted in case hardware restricts the
> >> >> ability of sw to install/manipulate TLB entries directly. That is
> >> >> something worth looking at more though. There is definitely a
> tradeoff
> >> >> between predictable costs and throughput performance. It might be
> >> >> worth implementing both approaches.
> >> >>
> >> >> Gedare
> >> >
> >> >
> >> > We also need to consider the cases where the stack sharing would be
> necessary-
> >> >
> >> > - We can have explicit cases where an application gets the attributes
> of a thread by pthread_attr_getstack() and then access this from another
> thread.
> >> >
> >> > -  An implicit case would be when a thread places the address of an
> object from its stack onto a message queue and we have other threads
> accessing it, in general, all blocking reads (sockets, files etc.) will
> share stacks.
> >> >
> >> > This will be documented so that the user first shares the required
> stacks and then performs the above operations.
> >> >
> >>
> >> Yes. It may also be worth thinking whether we can/should "relocate"
> >> stacks when they get shared and spare TLB entries are low. This would
> >> be a dynamic way to consolidate regions, while a static way would rely
> >> on some configuration method to declare ahead of time which stacks may
> >> be shared, or to require the stack allocator (hook) to manage that
> >> kind of complexity.
> >
> >
> > Sorry but I am not sure I clearly understand what you are trying to
> suggest. Does relocating stacks mean moving them to the same virtual
> address as the thread-stack it is being shared with but with different ASID?
>
> No. We don't want to break the 1:1 pa:va mappings. That is another
> design constraint, I suppose.
>
> If a user wants to share several sets of tasks stacks mutually with
> each other, using the same permissions (e..g, RW), then it would be
> efficient to pack the sharing tasks together in the same page/segment
> to use 1 TLB entry for them. This is a thought for an optimizationy
> down the road, maybe.

Gedare
>

Got it. Now that we are clear on most of the aspects of low-level
design(handling context switch through interrupt remains), I suppose we can
decide as to how the high-level user-configuration design/implementation
should look like.

- My idea has been to configure the stack protection mecahnism in
application based on the current scheme for configuring a system on RTEMS.
- We can have a 'CONFIGURE_MPU_STACK_PROT' or a 'CONFIGURE_MMU_STACK_PROT'
and a 'CONFIGURE_PROT_NUMBER_STACK'  based on the CPU. The number of
protected stacks for an MPU based CPU would be the common minimum of all
architectures (Most of the architectures provide atleast 8
protectoin domains).
- Since, the task stacks are allocated from the rtems-workspace, depending
upon wether we MPU or MMU  we can set the workspace size for thread stack
allocation.

Parallel to this I have started my imlpementation for isolating
thread-stacks, as a first step I will be isolating two blocks of memory
with appropriate access permissions. Then I will extend this for
thread-stacks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20200529/28959bc3/attachment-0001.html>