ZynqMP APU RAM Start

Tue May 14 19:42:49 UTC 2024

On Tue, May 14, 2024 at 10:39 AM Sebastian Huber <
sebastian.huber at embedded-brains.de> wrote:

> On 14.05.24 17:11, Kinsey Moore wrote:
> > On Tue, May 14, 2024 at 1:28 AM Chris Johns <chrisj at rtems.org
> > <mailto:chrisj at rtems.org>> wrote:
> >
> >     On 14/5/2024 4:04 pm, Sebastian Huber wrote:
> >      > Hello,
> >      >
> >      > the ZynqMP APU RAM start addresses are far away from 0x0:
> >      >
> >      > cat spec/build/bsps/aarch64/xilinx-zynqmp/optramori.yml
> >      > SPDX-License-Identifier: CC-BY-SA-4.0 OR BSD-2-Clause
> >      > actions:
> >      > - get-integer: null
> >      > - assert-uint32: null
> >      > - env-assign: null
> >      > - format-and-define: null
> >      > build-type: option
> >      > copyrights:
> >      > - Copyright (C) 2020 On-Line Applications Research (OAR)
> >      > default:
> >      > - enabled-by:
> >      >   - aarch64/xilinx_zynqmp_lp64_a53
> >      >   - aarch64/xilinx_zynqmp_ilp32_zu3eg
> >      >   - aarch64/xilinx_zynqmp_lp64_cfc400x
> >      >   - aarch64/xilinx_zynqmp_lp64_zu3eg
> >      >   value: 0x10000000
> >      > - enabled-by: true
> >      >   value: 0x40018000
> >      > description: |
> >      >   base address of memory area available to the BSP
> >      > enabled-by: true
> >      > format: '{:#010x}'
> >      > links: []
> >      > name: BSP_XILINX_ZYNQMP_RAM_BASE
> >      > type: build
> >      >
> >      > What is the rationale for doing this? Any objections to change
> >     the start address
> >      > to 0x0?
> >     Not from me but existing workflows will break. Some things to keep
> >     in mind:
> >
> >     What is the default address used by Linux on this board and what
> >     uboot expects?
> >
> >     What do the Xilinx tools default to?
> >
> > The load addresses here were taken from other examples when I was first
> > writing this port.
> >
> > The QEMU load address is largely irrelevant since it reads it from the
> > ELF headers and locates it appropriately without other constraints.
> >
> > The address used on hardware is due to u-boot typically loading at
> > 0x8000000, the RTEMS ELF being initially loaded in lower RAM space, and
> > then u-boot unpacking RTEMS into 0x10000000. Everything can be moved
> > around, of course.
>
> Since the RPU cannot access the DDR RAM at 0x0, I suggest to locate the
> APU RAM at 0x0 and use half the size of the DDR RAM for the APU by
> default in the linker command file.
>

So the default RAM would be 256MB instead of the current 512MB. This seems
reasonable and should be sufficient for any tests I'm aware of.

Regarding moving the code to 0x0, that would break null pointer detection.
The vector table is currently mapped RWX due to AArch64 leaving room for 32
instructions per vector entry and the vector target being stored in that
space alongside the vector entry preamble. This could be made RX, but that
is work to be done.

> >
> >      > What is the MMU page size used by the BSPs? Would it be possible
> >     to add a NULL
> >      > pointer protection page?
> > The MMU translation table page size is 4KB (0x1000) and the granularity
> > is also 4KB. This will likely need to become more flexible for modern
> > chips that drop 4K page size support as 16KB and 64KB become more
> > common. The 0x0 area is unmapped by default and so throws data aborts on
> > attempted access.
>
> Since these boards usually have lots of DDR RAM available, I would
> switch to a 64KiB page size to reduce the amount of page table reloads
> from RAM. This would waste 64KiB for the NULL pointer protection and up
> to 128KiB at the text/read-only and read-only/read-write boundaries.
>

The page table reloads required depend on how granular the mappings are.
Large block mappings will only consume a few upper level entries instead of
mapping each individual 4KB granule. RTEMS doesn't generally modify
mappings, so mappings generated by translation table walks will only rarely
be invalidated from the TLB. That said, supporting 64KB translation
granules is worth the effort given the direction newer chips are going.

Be aware that we can't move everything over to 64KB blindly as there is no
guarantee of support for any particular translation granule size; 4KB,
16KB, and 64KB are optionally and independently supported so any
combination could exist. ZynqMP in particular supports 4KB and 64KB
translation granules and I'm aware of at least 1 chip that only supports
16KB for practical purposes. QEMU's support for Cortex-A53 cores
contradicts the Cortex-A53 TRM on many points of MMU support suggesting
support of all granule sizes, wrong ASID size, and other issues though I'm
sure QEMU does actually support those modes of operation.

If support for dynamic detection and configuration of granule size is
desired, operation on QEMU will be even less representative of the ability
to run on real hardware.

Kinsey
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.rtems.org/pipermail/devel/attachments/20240514/c21fe932/attachment-0001.htm>