Help on how to configure for user-defined memory protection support (GSoC 2020)

Fri May 29 15:01:02 UTC 2020

On Thu, May 28, 2020 at 6:24 PM Utkarsh Rai <utkarsh.rai60 at gmail.com> wrote:
>
>
>
>
> On Wed, May 27, 2020 at 8:29 PM Gedare Bloom <gedare at rtems.org> wrote:
>>
>> On Tue, May 26, 2020 at 6:12 PM Utkarsh Rai <utkarsh.rai60 at gmail.com> wrote:
>> >
>> >
>> >
>> > On Mon, May 25, 2020 at 9:32 PM Gedare Bloom <gedare at rtems.org> wrote:
>> >>
>> >> On Mon, May 25, 2020 at 5:39 AM Utkarsh Rai <utkarsh.rai60 at gmail.com> wrote:
>> >> >
>> >> >
>> >> > On Fri, May 22, 2020, at 10:59 AM Gedare Bloom <gedare at rtems.org> wrote:
>> >> >>
>> >> >> >  This means that our low-level design for providing thread stack protection may look something like this:-
>> >> >> >
>> >> >> > 1. For MPU based processors, the number of protected stacks will depend on the number of protection domains i.e. for MPUs with 8 protection domains we can have 7 protected stacks ( 1 of the region will be assigned for global data). For MMU based system we will have a section (a page of size 1MB) for global data and task address space will be divided into smaller pages, page sizes will be decided by keeping in mind the number of TLB entries, in a manner I have described above in the thread.
>> >> >> >
>> >> >> There is value to defining a few of the global regions. I'll assume
>> >> >> R/W/X permissions. Then code (.text) should be R/X. read-only data
>> >> >> sections should be grouped together and made R. Data sections should
>> >> >> be RW. And then stacks should be added to the end. The linker scripts
>> >> >> should be used to group the related sections together. I think some
>> >> >> ARM BSPs do some of this already.  That seems like a minimally useful
>> >> >> configuration for most users that would care, they want to have also
>> >> >> protection of code from accidental overwrite, and probably data too,
>> >> >> and non-executable data in general. You also may have to consider a
>> >> >> few more permission complications (shared/cacheable) depending on the
>> >> >> hardware.
>> >> >
>> >> >
>> >> > The low-level mmu implementation for ARMv7 BSPS has an 'ARMV7_CP15_START_DEFAULT_SECTIONS' which lists out various regions with appropriate permissions and then are grouped by a linker script. This should be the standard way of handling the placement of statically allocated regions.
>> >> >
>> >> >> >  2. The protection, size, page table, and sharing attributes of each created thread will be tracked.
>> >> >> >
>> >> >> I'd rather we not be calling this a page table. MPU-based systems
>> >> >> don't have a notion of page table. But maybe it is OK as long as we
>> >> >> understand that you mean the data structure responsible for mapping
>> >> >> out the address space. I'm not sure what you mean by size, unless you
>> >> >> refer to that thread's stack.
>> >> >>
>> >> >> >  3. At every context switch, these attributes will be updated, the static-global regions will be assigned a global ASID and will not change during the switch only the protected regions will be updated.
>> >> >> >
>> >> >> Yes, assuming the hardware supports ASIDs and a global attribute.
>> >> >>
>> >> >> I don't know if you will be able to pin the global entries in
>> >> >> hardware. You'll want to keep an eye out for that. If not, you might
>> >> >> need to do something in software to ensure they don't get evicted
>> >> >> (e.g., touch them all before finishing a context switch assuming LRU
>> >> >> replacement).
>> >> >>
>> >> >> >  4. Whenever we share stacks, the page table entries of the shared stack, with the access bits as specified by the mmap/shm high-level APIs will be installed to the current thread. This is different from simply providing the page table base address of the shared thread-stack ( what if the user wants to make the shared thread only readable from another thread while the 'original' thread is r/w enabled?) We will also have to update the TLB by installing the shared regions while the global regions remain untouched.
>> >> >> >
>> >> >>
>> >> >> Correct. I think we need to make a design decision whether a stack can
>> >> >> exceed one page. It will simplify things if we can assume that, but it
>> >> >> may limit applications unnecessarily. Have to think on that.
>> >> >
>> >> >
>> >> > If we go with the above assumption, we will need to increase the size of the page i.e. pages of 16Kib or 64Kib. Most of the applications won't require stacks of this size and will result in wasted memory for each thread. I think it would be better if we have multiple pages, as most of the applications will have stacks that may fit in a single 4KiB page anyway.
>> >> >
>> >>
>> >> I mis-typed. I meant I think we can assume stacks fit in one page. It
>> >> would be impossible to deal with otherwise.
>> >>
>> >> >>
>> >> >> The "page table base address" points to the entire structure that maps
>> >> >> out a thread's address space, so you'd have to walk it to find the
>> >> >> entry/entries for its stack. So, definitely not something you'd want
>> >> >> to do.
>> >> >>
>> >> >> The shm/mmap should convey the privileges to the requesting thread
>> >> >> asking to share. This will result in adding the shared entry/entries
>> >> >> to that thread's address space, with the appropriately set
>> >> >> permissions. So, if the entry is created with read-only permission,
>> >> >> then that is how the thread will be sharing. The original thread's
>> >> >> entry should not be modified by the addition of an entry in another
>> >> >> thread for the same memory region.
>> >> >>
>> >> >> I lean toward thinking it is better to always pay for the TLB miss at
>> >> >> the context switch, which might mean synthesizing accesses to the
>> >> >> entries that might have been evicted in case hardware restricts the
>> >> >> ability of sw to install/manipulate TLB entries directly. That is
>> >> >> something worth looking at more though. There is definitely a tradeoff
>> >> >> between predictable costs and throughput performance. It might be
>> >> >> worth implementing both approaches.
>> >> >>
>> >> >> Gedare
>> >> >
>> >> >
>> >> > We also need to consider the cases where the stack sharing would be necessary-
>> >> >
>> >> > - We can have explicit cases where an application gets the attributes of a thread by pthread_attr_getstack() and then access this from another thread.
>> >> >
>> >> > -  An implicit case would be when a thread places the address of an object from its stack onto a message queue and we have other threads accessing it, in general, all blocking reads (sockets, files etc.) will share stacks.
>> >> >
>> >> > This will be documented so that the user first shares the required stacks and then performs the above operations.
>> >> >
>> >>
>> >> Yes. It may also be worth thinking whether we can/should "relocate"
>> >> stacks when they get shared and spare TLB entries are low. This would
>> >> be a dynamic way to consolidate regions, while a static way would rely
>> >> on some configuration method to declare ahead of time which stacks may
>> >> be shared, or to require the stack allocator (hook) to manage that
>> >> kind of complexity.
>> >
>> >
>> > Sorry but I am not sure I clearly understand what you are trying to suggest. Does relocating stacks mean moving them to the same virtual address as the thread-stack it is being shared with but with different ASID?
>>
>> No. We don't want to break the 1:1 pa:va mappings. That is another
>> design constraint, I suppose.
>>
>> If a user wants to share several sets of tasks stacks mutually with
>> each other, using the same permissions (e..g, RW), then it would be
>> efficient to pack the sharing tasks together in the same page/segment
>> to use 1 TLB entry for them. This is a thought for an optimizationy
>> down the road, maybe.
>>
>> Gedare
>
>
> Got it. Now that we are clear on most of the aspects of low-level design(handling context switch through interrupt remains), I suppose we can decide as to how the high-level user-configuration design/implementation should look like.
>
> - My idea has been to configure the stack protection mecahnism in application based on the current scheme for configuring a system on RTEMS.
> - We can have a 'CONFIGURE_MPU_STACK_PROT' or a 'CONFIGURE_MMU_STACK_PROT'  and a 'CONFIGURE_PROT_NUMBER_STACK'  based on the CPU. The number of protected stacks for an MPU based CPU would be the common minimum of all architectures (Most of the architectures provide atleast 8 protectoin domains).
> - Since, the task stacks are allocated from the rtems-workspace, depending upon wether we MPU or MMU  we can set the workspace size for thread stack allocation.
>
Maybe this part doesn't make sense to worry about yet. This work will
(hopefully) merge in 6.0, along with the rework of application
configuration. Then things change and we may want to leverage
different approaches.

For now you can roll the application configuration however you want to
get things working. We can iterate and evolve over the summer.

> Parallel to this I have started my imlpementation for isolating thread-stacks, as a first step I will be isolating two blocks of memory with appropriate access permissions. Then I will extend this for thread-stacks.

Great.