Performance problem with PyYAML

Sebastian Huber sebastian.huber at embedded-brains.de
Fri Nov 8 08:26:51 UTC 2019


Hello,

I added the build specifications for most of the test programs. This 
resulted in about 656 *.yml files. It seems this is a bit too much for 
the PyYAML module which is written purely in Python. It needs three to 
four seconds on my machine to load the files. The BSPs will add another 
couple of hundred files. Converting the format to JSON solves the 
performance issues. The time to load using JSON files drops to 0.2s to 
0.3s. Using JSON has the benefit that this is a standard Python library 
module.

There are two problems with JSON:

1. Doorstop currently supports only YAML. I guess support for JSON can 
be added in principle, but it is not a small change.

2. Multi-line strings in JSON are less readable, e.g.

cat spec/build/bsps/riscv/riscv/RTEMS-BUILD-BSP-RISCV-RISCV-004.json
{
   "active": true,
   "build-type": "config-file",
   "content": "MEMORY {\n  RAM : ORIGIN = ${RISCV_RAM_REGION_BEGIN}, 
LENGTH = ${RISCV_RAM_REGION_SIZE}\n}\n\nREGION_ALIAS (\"REGION_START\", 
RAM);\nREGION_ALIAS (\"REGION_TEXT\", RAM);\nREGION_ALIAS 
(\"REGION_TEXT_LOAD\", RAM);\nREGION_ALIAS (\"REGION_FAST_TEXT\", 
RAM);\nREGION_ALIAS (\"REGION_FAST_TEXT_LOAD\", RAM);\nREGION_ALIAS 
(\"REGION_RODATA\", RAM);\nREGION_ALIAS (\"REGION_RODATA_LOAD\", 
RAM);\nREGION_ALIAS (\"REGION_DATA\", RAM);\nREGION_ALIAS 
(\"REGION_DATA_LOAD\", RAM);\nREGION_ALIAS (\"REGION_FAST_DATA\", 
RAM);\nREGION_ALIAS (\"REGION_FAST_DATA_LOAD\", RAM);\nREGION_ALIAS 
(\"REGION_RTEMSSTACK\", RAM);\nREGION_ALIAS (\"REGION_WORK\", 
RAM);\n\nINCLUDE linkcmds.base\n",
   "derived": false,
   "destination": "${BSP_LIBDIR}/linkcmds",
   "enabled-by": [],
   "header": "",
   "level": 1.3,
   "links": [],
   "normative": true,
   "order": 1000,
   "ref": "",
   "reviewed": "E3oxPkiXxl6OF-CbAPybZ3Uj-yDa-gX0TNlCe8KI_AE=",
   "target": "linkcmds",
   "text": "",
   "type": "build"
}

vs.

cat spec/build/bsps/riscv/riscv/RTEMS-BUILD-BSP-RISCV-RISCV-004.yml
active: true
build-type: config-file
content: |
   MEMORY {
     RAM : ORIGIN = ${RISCV_RAM_REGION_BEGIN}, LENGTH = 
${RISCV_RAM_REGION_SIZE}
   }

   REGION_ALIAS ("REGION_START", RAM);
   REGION_ALIAS ("REGION_TEXT", RAM);
   REGION_ALIAS ("REGION_TEXT_LOAD", RAM);
   REGION_ALIAS ("REGION_FAST_TEXT", RAM);
   REGION_ALIAS ("REGION_FAST_TEXT_LOAD", RAM);
   REGION_ALIAS ("REGION_RODATA", RAM);
   REGION_ALIAS ("REGION_RODATA_LOAD", RAM);
   REGION_ALIAS ("REGION_DATA", RAM);
   REGION_ALIAS ("REGION_DATA_LOAD", RAM);
   REGION_ALIAS ("REGION_FAST_DATA", RAM);
   REGION_ALIAS ("REGION_FAST_DATA_LOAD", RAM);
   REGION_ALIAS ("REGION_RTEMSSTACK", RAM);
   REGION_ALIAS ("REGION_WORK", RAM);

   INCLUDE linkcmds.base
derived: false
destination: ${BSP_LIBDIR}/linkcmds
enabled-by: []
header: ''
level: 1.3
links: []
normative: true
order: 1000
ref: ''
reviewed: E3oxPkiXxl6OF-CbAPybZ3Uj-yDa-gX0TNlCe8KI_AE=
target: linkcmds
text: ''
type: build

An alternative to using JSON would the addition of a post-processed file 
which gathers all build specification items included in the RTEMS 
sources. The PyYAML is then only necessary if external build 
specification items are used (this should be not hundreds). For example 
we could store the information of all the build specification items in a 
file generated by the Python marshal module. Each time a build 
specification item is added/changed/removed we have to update this file 
as well (stored in the repository).

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax     : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.


More information about the devel mailing list