Performance problem with PyYAML
Gedare Bloom
gedare at rtems.org
Fri Nov 8 15:57:02 UTC 2019
On Fri, Nov 8, 2019 at 1:27 AM Sebastian Huber
<sebastian.huber at embedded-brains.de> wrote:
>
> Hello,
>
> I added the build specifications for most of the test programs. This
> resulted in about 656 *.yml files. It seems this is a bit too much for
> the PyYAML module which is written purely in Python. It needs three to
> four seconds on my machine to load the files. The BSPs will add another
> couple of hundred files. Converting the format to JSON solves the
> performance issues. The time to load using JSON files drops to 0.2s to
> 0.3s. Using JSON has the benefit that this is a standard Python library
> module.
>
> There are two problems with JSON:
>
> 1. Doorstop currently supports only YAML. I guess support for JSON can
> be added in principle, but it is not a small change.
>
> 2. Multi-line strings in JSON are less readable, e.g.
>
> cat spec/build/bsps/riscv/riscv/RTEMS-BUILD-BSP-RISCV-RISCV-004.json
> {
> "active": true,
> "build-type": "config-file",
> "content": "MEMORY {\n RAM : ORIGIN = ${RISCV_RAM_REGION_BEGIN},
> LENGTH = ${RISCV_RAM_REGION_SIZE}\n}\n\nREGION_ALIAS (\"REGION_START\",
> RAM);\nREGION_ALIAS (\"REGION_TEXT\", RAM);\nREGION_ALIAS
> (\"REGION_TEXT_LOAD\", RAM);\nREGION_ALIAS (\"REGION_FAST_TEXT\",
> RAM);\nREGION_ALIAS (\"REGION_FAST_TEXT_LOAD\", RAM);\nREGION_ALIAS
> (\"REGION_RODATA\", RAM);\nREGION_ALIAS (\"REGION_RODATA_LOAD\",
> RAM);\nREGION_ALIAS (\"REGION_DATA\", RAM);\nREGION_ALIAS
> (\"REGION_DATA_LOAD\", RAM);\nREGION_ALIAS (\"REGION_FAST_DATA\",
> RAM);\nREGION_ALIAS (\"REGION_FAST_DATA_LOAD\", RAM);\nREGION_ALIAS
> (\"REGION_RTEMSSTACK\", RAM);\nREGION_ALIAS (\"REGION_WORK\",
> RAM);\n\nINCLUDE linkcmds.base\n",
I think you can make this look nicer by appending a \ after each \n
and breaking the line, e.g.,
"content": "MEMORY {\n\
RAM: ORIGIN = ${RISCV_RAM_REGION_BEGIN}, LENGTH = ${RISCV_RAM_REGION_SIZE}\n\
}\n\
\n\
\n\
REGION_ALIAS ...
It is still ugly, but somewhat readable/manageable.
> "derived": false,
> "destination": "${BSP_LIBDIR}/linkcmds",
> "enabled-by": [],
> "header": "",
> "level": 1.3,
> "links": [],
> "normative": true,
> "order": 1000,
> "ref": "",
> "reviewed": "E3oxPkiXxl6OF-CbAPybZ3Uj-yDa-gX0TNlCe8KI_AE=",
> "target": "linkcmds",
> "text": "",
> "type": "build"
> }
>
> vs.
>
> cat spec/build/bsps/riscv/riscv/RTEMS-BUILD-BSP-RISCV-RISCV-004.yml
> active: true
> build-type: config-file
> content: |
> MEMORY {
> RAM : ORIGIN = ${RISCV_RAM_REGION_BEGIN}, LENGTH =
> ${RISCV_RAM_REGION_SIZE}
> }
>
> REGION_ALIAS ("REGION_START", RAM);
> REGION_ALIAS ("REGION_TEXT", RAM);
> REGION_ALIAS ("REGION_TEXT_LOAD", RAM);
> REGION_ALIAS ("REGION_FAST_TEXT", RAM);
> REGION_ALIAS ("REGION_FAST_TEXT_LOAD", RAM);
> REGION_ALIAS ("REGION_RODATA", RAM);
> REGION_ALIAS ("REGION_RODATA_LOAD", RAM);
> REGION_ALIAS ("REGION_DATA", RAM);
> REGION_ALIAS ("REGION_DATA_LOAD", RAM);
> REGION_ALIAS ("REGION_FAST_DATA", RAM);
> REGION_ALIAS ("REGION_FAST_DATA_LOAD", RAM);
> REGION_ALIAS ("REGION_RTEMSSTACK", RAM);
> REGION_ALIAS ("REGION_WORK", RAM);
>
> INCLUDE linkcmds.base
> derived: false
> destination: ${BSP_LIBDIR}/linkcmds
> enabled-by: []
> header: ''
> level: 1.3
> links: []
> normative: true
> order: 1000
> ref: ''
> reviewed: E3oxPkiXxl6OF-CbAPybZ3Uj-yDa-gX0TNlCe8KI_AE=
> target: linkcmds
> text: ''
> type: build
>
> An alternative to using JSON would the addition of a post-processed file
> which gathers all build specification items included in the RTEMS
> sources. The PyYAML is then only necessary if external build
> specification items are used (this should be not hundreds). For example
> we could store the information of all the build specification items in a
> file generated by the Python marshal module. Each time a build
> specification item is added/changed/removed we have to update this file
> as well (stored in the repository).
>
What is the expected "churn rate" for the build specification items?
If they are not changing a lot, this approach might work. Otherwise,
we have the problem of impossible conflict resolution in the marshaled
blob. Alternately from keeping it in the repo, can it be (easily?)
generated during build configuration step? Or would that still be
overly time-consuming.
The JSON approach seems a better route, if support can be added in Doorstop.
> --
> Sebastian Huber, embedded brains GmbH
>
> Address : Dornierstr. 4, D-82178 Puchheim, Germany
> Phone : +49 89 189 47 41-16
> Fax : +49 89 189 47 41-09
> E-Mail : sebastian.huber at embedded-brains.de
> PGP : Public key available on request.
>
> Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
> _______________________________________________
> devel mailing list
> devel at rtems.org
> http://lists.rtems.org/mailman/listinfo/devel
More information about the devel
mailing list