Help on how to configure for user-defined memory protection support (GSoC 2020)

Fri May 22 05:29:33 UTC 2020

>  This means that our low-level design for providing thread stack protection may look something like this:-
>
> 1. For MPU based processors, the number of protected stacks will depend on the number of protection domains i.e. for MPUs with 8 protection domains we can have 7 protected stacks ( 1 of the region will be assigned for global data). For MMU based system we will have a section (a page of size 1MB) for global data and task address space will be divided into smaller pages, page sizes will be decided by keeping in mind the number of TLB entries, in a manner I have described above in the thread.
>
There is value to defining a few of the global regions. I'll assume
R/W/X permissions. Then code (.text) should be R/X. read-only data
sections should be grouped together and made R. Data sections should
be RW. And then stacks should be added to the end. The linker scripts
should be used to group the related sections together. I think some
ARM BSPs do some of this already.  That seems like a minimally useful
configuration for most users that would care, they want to have also
protection of code from accidental overwrite, and probably data too,
and non-executable data in general. You also may have to consider a
few more permission complications (shared/cacheable) depending on the
hardware.

>  2. The protection, size, page table, and sharing attributes of each created thread will be tracked.
>
I'd rather we not be calling this a page table. MPU-based systems
don't have a notion of page table. But maybe it is OK as long as we
understand that you mean the data structure responsible for mapping
out the address space. I'm not sure what you mean by size, unless you
refer to that thread's stack.

>  3. At every context switch, these attributes will be updated, the static-global regions will be assigned a global ASID and will not change during the switch only the protected regions will be updated.
>
Yes, assuming the hardware supports ASIDs and a global attribute.

I don't know if you will be able to pin the global entries in
hardware. You'll want to keep an eye out for that. If not, you might
need to do something in software to ensure they don't get evicted
(e.g., touch them all before finishing a context switch assuming LRU
replacement).

>  4. Whenever we share stacks, the page table entries of the shared stack, with the access bits as specified by the mmap/shm high-level APIs will be installed to the current thread. This is different from simply providing the page table base address of the shared thread-stack ( what if the user wants to make the shared thread only readable from another thread while the 'original' thread is r/w enabled?) We will also have to update the TLB by installing the shared regions while the global regions remain untouched.
>

Correct. I think we need to make a design decision whether a stack can
exceed one page. It will simplify things if we can assume that, but it
may limit applications unnecessarily. Have to think on that.

The "page table base address" points to the entire structure that maps
out a thread's address space, so you'd have to walk it to find the
entry/entries for its stack. So, definitely not something you'd want
to do.

The shm/mmap should convey the privileges to the requesting thread
asking to share. This will result in adding the shared entry/entries
to that thread's address space, with the appropriately set
permissions. So, if the entry is created with read-only permission,
then that is how the thread will be sharing. The original thread's
entry should not be modified by the addition of an entry in another
thread for the same memory region.

I lean toward thinking it is better to always pay for the TLB miss at
the context switch, which might mean synthesizing accesses to the
entries that might have been evicted in case hardware restricts the
ability of sw to install/manipulate TLB entries directly. That is
something worth looking at more though. There is definitely a tradeoff
between predictable costs and throughput performance. It might be
worth implementing both approaches.

Gedare