[RFC] generic CAN/CAN FD susbsytem for RTEMS from scratch - online documentation

Tue May 14 08:10:32 UTC 2024

On 2024-05-13 17:40, Christian MAUDERER wrote:
> Hello Pavel and Michal,
> 
> sorry for the late reply. I was on vacation last week.
> 
> On 2024-05-06 11:27, Pavel Pisa wrote:
>> Dear Christian,
>>
>> On Tuesday 30 of April 2024 08:40:43 Christian MAUDERER wrote:
>>>> For others, code under review hosted in CTU university GitLab
>>>> server
>>>>     https://gitlab.fel.cvut.cz/otrees/rtems/rtems-canfd
>>>> Documentation
>>>>     
>>>> https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/can/can-html/can.html
>>>>     
>>>> https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/index.html
>>>>
>>>> Main developer behind extension to CAN FD and switch to RTEMS
>>>> is Michal Lenc.
>>>>
>>>> The intention is to (hopefully) reach state when it meets criteria
>>>> to mainlining int RTEMS CPU kit under
>>>>
>>>>     cpukit/dev/can
>> ...
>>>> I agree, that it is compromise. But adding yet another file descriptor
>>>> like multiplexor for queues to each file descriptor seems to me as
>>>> too much complexity. But it can be added. even later as IOCTL to remove
>>>> individual queues based on CAN ID matches or queues IDs if create
>>>> is modified to return internal queue IDs...
>>>
>>> I somehow missed that you can open the device multiple times and get
>>> independent queues. With that, it's completely OK and should be flexible
>>> enough for most applications.
>>>
>>> It's great that you already have put some thought into how it could be
>>> extended later if some application needs more flexibility.
>> ...
>>>>> Did you check with
>>>>> some other hardware controller, whether the whole structures / defines
>>>>> / flags close to the hardware do work well for other controllers too?
>>>>
>>>> The code/concept is based on my previous LinCAN and OrtCAN work
>>>>
>>>> https://ortcan.sourceforge.net/lincan/
>> ...
>>> I didn't want to doubt your competence. Like I said it's some trap that
>>> I have fallen into often enough myself (like when guiding Prashanths
>>> GSoC project). But it's clear that you have put a lot of thought into
>>> that. So I would expect that there shouldn't be much trouble with most
>>> controllers. Maybe except for the ones where a semiconductor vendor
>>> thought it would be a good idea to create a completely different
>>> concept. But these are always difficult.
>>
>> I agree with discussion and searching for hard arguments.
>>
>> The solution is compromise and in general CAN bus concept
>> is optimized for direct replacement of wires in car
>> going between distinc units and its use as general
>> communication solution has some difficulties and requires
>> some compromises.
>>
>> For small devices with predefined purpose and Autosar,
>> it is ideal to allocate for each CAN ID (wire signal)
>> to be sent one communication object on the controller.
>> Same for each received signal value or their set in the
>> single frame. The most controllers are equipped by filters
>> and mechanism to do so including selection of the
>> Tx message object for physical bus-link arbitration
>> according to the priority. Then sending side updates
>> signal value in corresponding Tx object and receiving
>> side sees most actual one usually on the best effort basis,
>> older unread frames are overwritten by updated value.
>>
>> But even in simple ECU, there are obstacles to use this
>> principle in all kind of the communication. CAN bus is used
>> for firmware updates and general configuration. In this
>> case, the reliable delivery of all messages with given
>> CAN ID is required because whole sequence has to be
>> received and processed and the state evolution is associated
>> to the sequence. If a single message is lost, then all
>> data are unusable. Because sequence requires exact ordering
>> it is typical that only single Tx object is used. On Rx
>> side there can be problem to capture all frames without
>> overwrite by single Rx object so some controllers ad FIFO
>> which can be attached to each object or some mechanism
>> how to allocate more Rx objects and pass them to the user
>> in FIFO order.
>>
>> That works for small ECUs with single purpose firmware.
>> But on general purpose operating system which should
>> allow even complete monitoring of the CAN bus, allows
>> dynamically started applications and even whole virtual
>> CAN/CANopen nodes, allocation the controller Tx/Rx message
>> objects for each specific purpose is impossible.
>>
>> That is why all generic CAN subsystems which I know
>> (CAN4Linux, LinCAN, SockteCAN, NuttX char device CAN,
>> windows Peak's drivers etc.) define API based on
>> opening driver and presenting received messages
>> in FIFO order to application (with options for software
>> filtering but usually not propagated to controller,
>> HW - LinCAN has some option to union user FIFOs to
>> mask and ID propagated to HW, but you usually end with
>> fully end with need to receve all anyway and it has not been
>> used at the end). The Tx FIFO order is required for messages
>> with same ID or even sometimes between same stream of mesages
>> even wit altering ID for correct realization of some higher
>> level protocols.
>>
>> The result is that even on hardware equipped with multiple
>> Tx objects but without special Tx FIFO order preserving
>> cyclic queue only single Tx object is used to realize
>> transmission of all messages, for example SocketCAN on
>> XCAN controller. So only part of the CAN bus media
>> badwidth can be utilized by single node. May it be, it is sometimes
>> a luck, because CAN IDs are not correctly allocated according
>> to priority even on cars critical subsystems. On the Rx side original
>> buffers approach is hard to use in order preserving FIFO concept,
>> but the most of today controllers add some option to keep order
>> and leave processing and distribution on software side.
>> See evolution from CCAN to DCAN to overcome that problem.
>> We have even made LinCAN for CCAN many many years ago
>> which somehow kept required properties but it was headache.
>>
>> So back to generic OS can interfaces, all I know are FIFO(s)
>> based. Most of them keep strict FIFO order on Tx side
>> which results in HoL (head-of-the-line) blocking and priority
>> inversion on bus loaded by middle priority from other node.
>>
>> That is why SocketCAN adds alloc_candev_mqs (multiple-queues) alternative
>> for drivers
>>
>> https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/dev.c#L249
>>
>> but as I know, no mainline kernel driver is using that.
>> We have done some work to research and even a little extend
>> Linux networking QoS subsystem to solve buffer bloat by old
>> messages for traffic requiring best effort (most up to date
>> data for control) for given IDs and to limit badwidth
>> of others or virtual guests connected through QEMU to
>> physical bus etc. may years ago at time when multi-queue
>> has not been available on Linux side. I have long time plan
>> to extend CTU CAN FD mainline Linux driver for this support
>> and probably to be the first example how to overcome HoL/priority
>> inversion in Linux CAN subsystem. It has been planned in original
>> LinCAN before SoketCAN and it is now implemented in proposed
>> RTEMS CAN/FD framework where application can setup multiple
>> queues even for single open instance with different Tx priority
>> class and when used and mapped correctly to CAN IDs, it can
>> prevent priority inversion. It is not generic, because it is
>> quite expensive for deeper FIFOs and even mutual order of
>> Tx messages has to be preserved for many protocols as discussed
>> earlier. CTU CAN FD IP core interface to software has been architected
>> by me to allow maximal utilization of the Tx buffers and their
>> reallocation when needed for higher priority message.
>> Wait for DTP processing and publication of our international CAN
>> Conference 2024 article or come and meet next week in Baden-Baden
>>
>>    https://www.can-cia.org/icc/
>>
>> There are two branches of the thought from this point
>>
>> 1) how it maps to other controllers
>>
>> For these equipped by single Tx object only (i.e. SJA1000),
>> it maps well because attempt to repeat Tx and arbitration
>> can be disabled when higher priority queue becomes ready
>> and our CAN infrastructure allows to push back lower
>> priority message and schedule higher one to be sent.
>>
>> For more complex one, if they do not allow to control Tx objects
>> order then only single Tx object can be used. Bad, link underutilization,
>> but it is what is standard in SocketCAN and other CAN solutions
>> for general purpose operating systems today. All controllers
>> which I know allows to stop Tx attempt repeat and I hope to
>> seen at all option t check if the latest attempt has been
>> successful or not. So newt RTEMS CAN can use them same
>> as on SJA1000. On Rx side, most have FIFO preserving
>> option to use multiple buffers. Sometimes partially
>> broken, burdened by erratas etc. (like iMX RT where
>> we overcome these problems in NuttX drivers).
>> When number of Tx priority classes is limited (for proposed
>> system by default 3 but compile time configurable) then
>> we can allocate one Tx buffer for each class, easy and
>> preserves HoL priority inversion even on simple controllers.
>> If there is option to order Tx according to the buffer
>> index in the controller, then there is option for a little
>> more performant solution when multiple Tx buffers are allocated
>> for each class and they are sequentially filled till highest
>> allocated buffer index is filled. Then there is some gap till
>> all these buffers in given priority are sent because
>> cyclic filling of the minimal index would result in reordering
>> with possible break of some protocol requirements.
>> Some controllers allows to attach DMA realized FIFOs to more
>> Tx objects, in such case it would map to proposed design well
>> too. Some newer controllers adds local priority bits above
>> CAN ID ones (i.e. new NXP FlexCAN). This could allow cyclic
>> use of some Tx objects/buffers similar to CTU CAN FD.
>> There will be problems because multiple Tx buffers priorities
>> are not reachable by single atomic operation like in CTU CAN FD
>> case. But I have some idea how to implement sequential
>> updates to ensure order in the class. There would be problem,
>> that most controllers do not allow to update this information
>> on the objects participating actively in arbitration. So it would
>> lead to much more acrobation between eggs and some gap time,
>> where none message is offered in the link arbitration even that
>> there are pending user requests will be inevitable in some
>> scenarios after some number of messages sent. That cannot
>> be on the bus side worse that considering fixed order according
>> to index. May be, it can be found that overhead does not worth
>> that. But we preserve API in variants in all cases...
>>
>> 2) use of the CAN bus in applications requiring maximal bus
>> transparency with minimal latency and SW load. This is
>> totally opposite of the general CAN bus subsystem for
>> general purpose RTOS. The API in this case should allocated
>> Tx and Rx controller objects for the individual purposes/CAN IDs.
>> Rx side SW processing can be considered as alternative and proposed
>> framework allows to setup queues, but it has overhead and under
>> extreme load it can lost some messages if HW is not performant enough.
>> On Tx side it is even more problematic.
>>
>> But if this type of use of RTEMS for example for Autosar or Simulink
>> generated code is considered then it is possible to extend actual
>> proposed API by IOCTLs which allows to reserve some controller
>> objects for specific purposes and allows to access them directly
>> for minimal overhead and use under direct application control or attach
>> separated controller side "canque_ends_dev_t" to such objects and
>> propagate them to some clients to standard CAN read and write API.
>>
>> So I think that the proposed framework provides what is expected
>> bu most of general purpose CAN/CAN FD framework users, tries to
>> perpare a little even for come of CAN XL, solves problems which
>> may be practically unsolved by all other generic approaches still.
>> And we have some clue how to extend support for most/all other
>> controllers and even some open doors to offer even ECU style
>> API for applications which benefit from direct controller
>> buffers use/allocation which is possible on controllers
>> with abundant number of buffers (not case of SJA1000
>> and very limited on CTU CAN FD - max 8 can be configured
>> to silicon under actual registers map).
>>
>> I understand that the text is long but you have asked for
>> it in the fact and I provide complete thought dump
>> to analyes it.
> 
> Thanks for the (very) detailed explanation. My intention was to express 
> that I'm completely OK with only one driver because you clearly have 
> thought about other hardware too. Your explanation just makes it even 
> more clear how much thought you put into it ;)
> 
>>
>> I would be happy if you and or others find time to look
>> into actual code implementation to identify what could
>> be issue for mainlining as soon as possible because
>> after May 24 changes do not propagate into Michal Lenc's
>> thesis text which can be alternative and more in depth
>> documentation and analysis than what fits into official
>> RTEMS one. The full document has already 47 pages and
>> 34 of the actual text without content and appendices.
>> Document includes benchmarks under RTEMS load by HTTP
>> traffic, priority inversion prevention confirmation
>> by measurements with performance data etc.
>> It will be published on CTU in May or June
>>    https://dspace.cvut.cz/
>> and links will be added to
>>    https://canbus.pages.fel.cvut.cz/
>> same as for much shorter iCC article and presentation.
>>
> 
> Code review without patches or a review system is always a bit more 
> effort because there is nothing to add comments directly. It seems that 
> I can't register on the gitlab instance that you provided. So let's try 
> it here.
> 
> I'll mainly take a look at the headers because they define the 
> interface. That's the most important part if you ask me. Bugs in the 
> code should be fixable later.
> 
> I'll try to categorize my comments a bit. If it has a *Style* or *Typo* 
> in front of it, you can just ignore it. It's not really important. It's 
> just something that I noted while reading through the code. *Question* 
> or *Note* are more important.
> 
> And please note: You know CAN a lot better than me. So quite possible 
> that I don't see a lot of stuff and that I might have some odd odd 
> questions.
> 
> 
> ### First the ones that I plan to more or less ignore so that I can 
> concentrate on the important parts:
> 
> Test or demo apps. So most likely not relevant for a review:
> 
> ./rtems_can_test/can_test.h
> ./rtems_can_test/app_def.h
> ./rtems_can_test/system.h
> ./rtems_can_test/can_register.h
> ./zynq_template/app_def.h
> ./zynq_template/zynq_reg.h
> ./zynq_template/system.h
> 
> Seems to be left over from some tests:
> 
> ./lib/libbar/bar.h
> 
> Driver specific files. I think these are not that high priority either:
> 
> ./lib/candrv/ctucanfd/ctucanfd_txb.h
> ./lib/candrv/ctucanfd/ctucanfd_kframe.h
> ./lib/candrv/ctucanfd/ctucanfd_kregs.h
> ./lib/candrv/dev/can/ctucanfd.h
> 
> 
> ### Now the more important ones: The interfaces
> 
> #### ./lib/candrv/dev/can/can.h
> 
> *Style*: I would suggest to group defines a bit more. You already used 
> prefixes like RTEMS_CAN_QUEUE_* which is great. You can improve that a 
> bit more if you use Doxygen "@name" and "@{" ... "@}". For an example 
> take a look at
> 
> https://gitlab.rtems.org/rtems/rtos/rtems/-/blob/main/cpukit/include/dev/i2c/i2c.h?ref_type=heads#L80
> 
> Which leads to a group in the doxygen output:
> 
> https://docs.rtems.org/doxygen/branches/master/cpukit_2include_2dev_2i2c_2i2c_8h.html
> 
> The same is true for some other defines in other files. I won't mention 
> it every time.
> 
> 
> *Question*: Why do you prefix some defines with RTEMS (like 
> RTEMS_CAN_CHIP_MODE) and others don't have that prefix (like 
> CAN_CTRLMODE_LOOPBACK)? The same is true for some other defines in other 
> files. I won't mention it every time.
> 
> 
> *Style*: Sometimes you use Doxygen @brief. Sometimes \ref. I think it 
> works, but it's a bit odd.
> 
> 
> *Question*: You have ioctls like RTEMS_CAN_DISCARD_QUEUES. According to 
> the description, that ioctl has a parameter. Why is it an _IO and not an 
> _IOW? The same is true for some more of the _IO ioctls.
> 
> 
> *Typo*: The description of RTEMS_CAN_GET_BITTIMING has a "geets" in it's 
> comment.
> 
> 
> *Note*: struct can_chip_ops doesn't have a description. I'm not entirely 
> sure what every member should do. For example: check_bittiming: From the 
> name I would expect that it only checks a bit timing. From the 
> ctucanfd.c it also sets a bit timing. I think it would be good if you 
> would add short descriptions here like "Check and set a bittiming. 
> Returns 0 on success or negated errno on error."
> 
> 
> *Detail*: can_bus_register(...) and can_bitrate2bittiming are missing a 
> description. If they are a public interface, it would be good if they 
> would have one.
> 
> 
> #### ./lib/candrv/dev/can/can-frame.h
> 
> *Detail*: Description of the can_frame_header. You have field 
> descriptions like "This member holds the CAN ID value". My first thought 
> was that it is some kind of address. But with taking a look at the code, 
> it seems that it is a bit mask that is combined out of CAN_ERR_ID_* 
> defines. If a field is expected to contain a certain group of defines: 
> Can you add a note regarding that?
> 
> 
> *Style*: You have a group of defines like CAN_ERR_*_DATA_BYTE. On the 
> first glance, I thought it would be the same group as CAN_ERR_ID_*. I 
> think I would have used CAN_ERR_DATA_BYTE_* instead. Of course that's a 
> style question and there are always good arguments for any style. Again: 
> A doxygen group using @{ and @} might achieve the same.
> 
> 
> *Typo*: You have a CAN_ERR_PROT_LOC_DARA_BYTE instead of ..._DATA_BYTE.
> 
> 
> *Question*: There are a lot of defines in can-frame.h like 
> CAN_ERR_PROT_LOC_DATA or CAN_ERR_TRX_UNSPEC. Is it clear for someone 
> more used to CAN, how to use these? Or would a description like 
> "Possible values for the Byte xyz in a can message" help?
> 
> 
> #### ./lib/candrv/dev/can/can-filter.h
> 
> *Detail*: Again: I'm not happy with the descriptions of the fields of 
> the structure. A field "flags" that is described as "This member holds 
> CAN flags" isn't really helpful. Which values can I assign to that 
> field? Is it a bit mask? Is it a field defined according to some 
> standard? In that case even a "Holds standard CAN flags" would be useful 
> because then I know that I just have to take a look at any CAN 
> documentation.
> 
> 
> #### ./lib/candrv/dev/can/can-devcommon.h
> 
> OK.
> 
> 
> #### ./lib/candrv/dev/can/can-helpers.h
> 
> *Note*: I don't like global defines like MAX_MSGOBJS without a prefix. 
> That's polluting the name space. Is there a reason that it doesn't have 
> the CAN or RTEMS_CAN prefix like all other defines?
> 
> Similar: There are defines like "BIT". Is there a reason for using such 
> a generic name? If it (for example) helps porting existing drivers from 
> another stack, that's great. Otherwise, I don't like these names.
> 
> Some more in this file are: len2dlc, GENMASK, FIELD_PREP, FIELD_GET
> 
> 
> Now I'm running out of time. I'll try to take a look at the following 
> files later or tomorrow:
> 

Like promised:

#### ./lib/candrv/dev/can/can-queue.h

*Note*: The doxygen documentation of struct canqueue_slot_t didn't work 
as expected: You used for example @next to describe the "next" field. 
That clearly didn't work: 
https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/structcanque__slot__t.html

*Question*: You use the Atomic_Uint from rtems/score/atomic.h for the 
slot_flags. For new code, I would suggest using the atomic_uint from 
stdatomic.h instead (C11). You have included that file already, so it 
shouldn't be a problem.

*Question*: Why is string.h included in that header? I don't see any 
str*, mem* or stp* functions used.

*Question*: Some of the functions have a bit of a short description. For 
example the canqueue_filter_match: The brief description basically just 
tells me exactly what the name of the function already told me. But how 
and in what situation would I use that function? How do these filters 
work? I can't tell that easily from the description or from the 
implementation of the function. Is it even thought as an interface that 
a user (in this case: someone writing a driver or an application) has to 
understand?

Maybe I have a basic problem here: Which headers are (more or less) 
public ones (used to write drivers and applications) and which ones are 
internal only? Or in other words: Which headers will be installed?

For this file, I strongly suspect that it is not user-facing. You have a 
lot of undocumented functions in it (like canqueue_next_inedge or 
canque_for_each_inedge).

The files that are really relevant for review at the current point are 
mainly the ones that a user (again: someone writing a driver or an 
application) can see.

If it is a public facing header: Did I miss some general description how 
a filter works and how a user should use it? It's quite possible that I 
missed that. I'm still only scratching the surface of your work at the 
moment.

#### ./lib/candrv/dev/can/can-stats.h

Looks OK.

#### ./lib/candrv/dev/can/can-virtual.h

OK.

#### ./lib/candrv/dev/can/can-bittiming.h

*Detail*: can_bittiming_const has a name with a fixed 32 char length. In 
ctucanfd.c you initialize that with a constant string. Is there some 
reason to have a fixed length string in RAM instead of using a pointer 
to a constant string that can have an arbitrary length and can be (for 
example) in the Flash?

Best regards

Christian

> 
> Best regards
> 
> Christian
> 
>> Best wishes,
>>
>>                  Pavel
>> -- 
>>                  Pavel Pisa
>>
>>      phone:      +420 603531357
>>      e-mail:     pisa at cmp.felk.cvut.cz
>>      Department of Control Engineering FEE CVUT
>>      Karlovo namesti 13, 121 35, Prague 2
>>      university: http://control.fel.cvut.cz/
>>      personal:   http://cmp.felk.cvut.cz/~pisa
>>      company:    https://pikron.com/ PiKRON s.r.o.
>>      Kankovskeho 1235, 182 00 Praha 8, Czech Republic
>>      projects:   https://www.openhub.net/accounts/ppisa
>>      social:     https://social.kernel.org/ppisa
>>      CAN related:http://canbus.pages.fel.cvut.cz/
>>      RISC-V education: https://comparch.edu.cvut.cz/
>>      Open Technologies Research Education and Exchange Services
>>      https://gitlab.fel.cvut.cz/otrees/org/-/wikis/home
> 

-- 
--------------------------------------------
embedded brains GmbH & Co. KG
Herr Christian MAUDERER
Dornierstr. 4
82178 Puchheim
Germany
email:  christian.mauderer at embedded-brains.de
phone:  +49-89-18 94 741 - 18
mobile: +49-176-152 206 08

Registergericht: Amtsgericht München
Registernummer: HRA 117265
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/