[RFC] generic CAN/CAN FD susbsytem for RTEMS from scratch - online documentation

Michal Lenc michallenc at seznam.cz
Tue May 14 08:43:23 UTC 2024


Hello Christian,

thank you for the thorough review. I am currently at the international
CAN Conference at Baden-Baden, so I will address this once I return by
the end of the week.

Best regards,

Michal Lenc

On 14. 05. 24 10:10, Christian MAUDERER wrote:
> On 2024-05-13 17:40, Christian MAUDERER wrote:
>> Hello Pavel and Michal,
>>
>> sorry for the late reply. I was on vacation last week.
>>
>> On 2024-05-06 11:27, Pavel Pisa wrote:
>>> Dear Christian,
>>>
>>> On Tuesday 30 of April 2024 08:40:43 Christian MAUDERER wrote:
>>>>> For others, code under review hosted in CTU university GitLab
>>>>> server
>>>>>     https://gitlab.fel.cvut.cz/otrees/rtems/rtems-canfd
>>>>> Documentation
>>>>>    
>>>>> https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/can/can-html/can.html
>>>>>    
>>>>> https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/index.html
>>>>>
>>>>> Main developer behind extension to CAN FD and switch to RTEMS
>>>>> is Michal Lenc.
>>>>>
>>>>> The intention is to (hopefully) reach state when it meets criteria
>>>>> to mainlining int RTEMS CPU kit under
>>>>>
>>>>>     cpukit/dev/can
>>> ...
>>>>> I agree, that it is compromise. But adding yet another file
>>>>> descriptor
>>>>> like multiplexor for queues to each file descriptor seems to me as
>>>>> too much complexity. But it can be added. even later as IOCTL to
>>>>> remove
>>>>> individual queues based on CAN ID matches or queues IDs if create
>>>>> is modified to return internal queue IDs...
>>>>
>>>> I somehow missed that you can open the device multiple times and get
>>>> independent queues. With that, it's completely OK and should be
>>>> flexible
>>>> enough for most applications.
>>>>
>>>> It's great that you already have put some thought into how it could be
>>>> extended later if some application needs more flexibility.
>>> ...
>>>>>> Did you check with
>>>>>> some other hardware controller, whether the whole structures /
>>>>>> defines
>>>>>> / flags close to the hardware do work well for other controllers
>>>>>> too?
>>>>>
>>>>> The code/concept is based on my previous LinCAN and OrtCAN work
>>>>>
>>>>> https://ortcan.sourceforge.net/lincan/
>>> ...
>>>> I didn't want to doubt your competence. Like I said it's some trap
>>>> that
>>>> I have fallen into often enough myself (like when guiding Prashanths
>>>> GSoC project). But it's clear that you have put a lot of thought into
>>>> that. So I would expect that there shouldn't be much trouble with most
>>>> controllers. Maybe except for the ones where a semiconductor vendor
>>>> thought it would be a good idea to create a completely different
>>>> concept. But these are always difficult.
>>>
>>> I agree with discussion and searching for hard arguments.
>>>
>>> The solution is compromise and in general CAN bus concept
>>> is optimized for direct replacement of wires in car
>>> going between distinc units and its use as general
>>> communication solution has some difficulties and requires
>>> some compromises.
>>>
>>> For small devices with predefined purpose and Autosar,
>>> it is ideal to allocate for each CAN ID (wire signal)
>>> to be sent one communication object on the controller.
>>> Same for each received signal value or their set in the
>>> single frame. The most controllers are equipped by filters
>>> and mechanism to do so including selection of the
>>> Tx message object for physical bus-link arbitration
>>> according to the priority. Then sending side updates
>>> signal value in corresponding Tx object and receiving
>>> side sees most actual one usually on the best effort basis,
>>> older unread frames are overwritten by updated value.
>>>
>>> But even in simple ECU, there are obstacles to use this
>>> principle in all kind of the communication. CAN bus is used
>>> for firmware updates and general configuration. In this
>>> case, the reliable delivery of all messages with given
>>> CAN ID is required because whole sequence has to be
>>> received and processed and the state evolution is associated
>>> to the sequence. If a single message is lost, then all
>>> data are unusable. Because sequence requires exact ordering
>>> it is typical that only single Tx object is used. On Rx
>>> side there can be problem to capture all frames without
>>> overwrite by single Rx object so some controllers ad FIFO
>>> which can be attached to each object or some mechanism
>>> how to allocate more Rx objects and pass them to the user
>>> in FIFO order.
>>>
>>> That works for small ECUs with single purpose firmware.
>>> But on general purpose operating system which should
>>> allow even complete monitoring of the CAN bus, allows
>>> dynamically started applications and even whole virtual
>>> CAN/CANopen nodes, allocation the controller Tx/Rx message
>>> objects for each specific purpose is impossible.
>>>
>>> That is why all generic CAN subsystems which I know
>>> (CAN4Linux, LinCAN, SockteCAN, NuttX char device CAN,
>>> windows Peak's drivers etc.) define API based on
>>> opening driver and presenting received messages
>>> in FIFO order to application (with options for software
>>> filtering but usually not propagated to controller,
>>> HW - LinCAN has some option to union user FIFOs to
>>> mask and ID propagated to HW, but you usually end with
>>> fully end with need to receve all anyway and it has not been
>>> used at the end). The Tx FIFO order is required for messages
>>> with same ID or even sometimes between same stream of mesages
>>> even wit altering ID for correct realization of some higher
>>> level protocols.
>>>
>>> The result is that even on hardware equipped with multiple
>>> Tx objects but without special Tx FIFO order preserving
>>> cyclic queue only single Tx object is used to realize
>>> transmission of all messages, for example SocketCAN on
>>> XCAN controller. So only part of the CAN bus media
>>> badwidth can be utilized by single node. May it be, it is sometimes
>>> a luck, because CAN IDs are not correctly allocated according
>>> to priority even on cars critical subsystems. On the Rx side original
>>> buffers approach is hard to use in order preserving FIFO concept,
>>> but the most of today controllers add some option to keep order
>>> and leave processing and distribution on software side.
>>> See evolution from CCAN to DCAN to overcome that problem.
>>> We have even made LinCAN for CCAN many many years ago
>>> which somehow kept required properties but it was headache.
>>>
>>> So back to generic OS can interfaces, all I know are FIFO(s)
>>> based. Most of them keep strict FIFO order on Tx side
>>> which results in HoL (head-of-the-line) blocking and priority
>>> inversion on bus loaded by middle priority from other node.
>>>
>>> That is why SocketCAN adds alloc_candev_mqs (multiple-queues)
>>> alternative
>>> for drivers
>>>
>>> https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/dev.c#L249
>>>
>>>
>>> but as I know, no mainline kernel driver is using that.
>>> We have done some work to research and even a little extend
>>> Linux networking QoS subsystem to solve buffer bloat by old
>>> messages for traffic requiring best effort (most up to date
>>> data for control) for given IDs and to limit badwidth
>>> of others or virtual guests connected through QEMU to
>>> physical bus etc. may years ago at time when multi-queue
>>> has not been available on Linux side. I have long time plan
>>> to extend CTU CAN FD mainline Linux driver for this support
>>> and probably to be the first example how to overcome HoL/priority
>>> inversion in Linux CAN subsystem. It has been planned in original
>>> LinCAN before SoketCAN and it is now implemented in proposed
>>> RTEMS CAN/FD framework where application can setup multiple
>>> queues even for single open instance with different Tx priority
>>> class and when used and mapped correctly to CAN IDs, it can
>>> prevent priority inversion. It is not generic, because it is
>>> quite expensive for deeper FIFOs and even mutual order of
>>> Tx messages has to be preserved for many protocols as discussed
>>> earlier. CTU CAN FD IP core interface to software has been architected
>>> by me to allow maximal utilization of the Tx buffers and their
>>> reallocation when needed for higher priority message.
>>> Wait for DTP processing and publication of our international CAN
>>> Conference 2024 article or come and meet next week in Baden-Baden
>>>
>>>    https://www.can-cia.org/icc/
>>>
>>> There are two branches of the thought from this point
>>>
>>> 1) how it maps to other controllers
>>>
>>> For these equipped by single Tx object only (i.e. SJA1000),
>>> it maps well because attempt to repeat Tx and arbitration
>>> can be disabled when higher priority queue becomes ready
>>> and our CAN infrastructure allows to push back lower
>>> priority message and schedule higher one to be sent.
>>>
>>> For more complex one, if they do not allow to control Tx objects
>>> order then only single Tx object can be used. Bad, link
>>> underutilization,
>>> but it is what is standard in SocketCAN and other CAN solutions
>>> for general purpose operating systems today. All controllers
>>> which I know allows to stop Tx attempt repeat and I hope to
>>> seen at all option t check if the latest attempt has been
>>> successful or not. So newt RTEMS CAN can use them same
>>> as on SJA1000. On Rx side, most have FIFO preserving
>>> option to use multiple buffers. Sometimes partially
>>> broken, burdened by erratas etc. (like iMX RT where
>>> we overcome these problems in NuttX drivers).
>>> When number of Tx priority classes is limited (for proposed
>>> system by default 3 but compile time configurable) then
>>> we can allocate one Tx buffer for each class, easy and
>>> preserves HoL priority inversion even on simple controllers.
>>> If there is option to order Tx according to the buffer
>>> index in the controller, then there is option for a little
>>> more performant solution when multiple Tx buffers are allocated
>>> for each class and they are sequentially filled till highest
>>> allocated buffer index is filled. Then there is some gap till
>>> all these buffers in given priority are sent because
>>> cyclic filling of the minimal index would result in reordering
>>> with possible break of some protocol requirements.
>>> Some controllers allows to attach DMA realized FIFOs to more
>>> Tx objects, in such case it would map to proposed design well
>>> too. Some newer controllers adds local priority bits above
>>> CAN ID ones (i.e. new NXP FlexCAN). This could allow cyclic
>>> use of some Tx objects/buffers similar to CTU CAN FD.
>>> There will be problems because multiple Tx buffers priorities
>>> are not reachable by single atomic operation like in CTU CAN FD
>>> case. But I have some idea how to implement sequential
>>> updates to ensure order in the class. There would be problem,
>>> that most controllers do not allow to update this information
>>> on the objects participating actively in arbitration. So it would
>>> lead to much more acrobation between eggs and some gap time,
>>> where none message is offered in the link arbitration even that
>>> there are pending user requests will be inevitable in some
>>> scenarios after some number of messages sent. That cannot
>>> be on the bus side worse that considering fixed order according
>>> to index. May be, it can be found that overhead does not worth
>>> that. But we preserve API in variants in all cases...
>>>
>>> 2) use of the CAN bus in applications requiring maximal bus
>>> transparency with minimal latency and SW load. This is
>>> totally opposite of the general CAN bus subsystem for
>>> general purpose RTOS. The API in this case should allocated
>>> Tx and Rx controller objects for the individual purposes/CAN IDs.
>>> Rx side SW processing can be considered as alternative and proposed
>>> framework allows to setup queues, but it has overhead and under
>>> extreme load it can lost some messages if HW is not performant enough.
>>> On Tx side it is even more problematic.
>>>
>>> But if this type of use of RTEMS for example for Autosar or Simulink
>>> generated code is considered then it is possible to extend actual
>>> proposed API by IOCTLs which allows to reserve some controller
>>> objects for specific purposes and allows to access them directly
>>> for minimal overhead and use under direct application control or attach
>>> separated controller side "canque_ends_dev_t" to such objects and
>>> propagate them to some clients to standard CAN read and write API.
>>>
>>> So I think that the proposed framework provides what is expected
>>> bu most of general purpose CAN/CAN FD framework users, tries to
>>> perpare a little even for come of CAN XL, solves problems which
>>> may be practically unsolved by all other generic approaches still.
>>> And we have some clue how to extend support for most/all other
>>> controllers and even some open doors to offer even ECU style
>>> API for applications which benefit from direct controller
>>> buffers use/allocation which is possible on controllers
>>> with abundant number of buffers (not case of SJA1000
>>> and very limited on CTU CAN FD - max 8 can be configured
>>> to silicon under actual registers map).
>>>
>>> I understand that the text is long but you have asked for
>>> it in the fact and I provide complete thought dump
>>> to analyes it.
>>
>> Thanks for the (very) detailed explanation. My intention was to
>> express that I'm completely OK with only one driver because you
>> clearly have thought about other hardware too. Your explanation just
>> makes it even more clear how much thought you put into it ;)
>>
>>>
>>> I would be happy if you and or others find time to look
>>> into actual code implementation to identify what could
>>> be issue for mainlining as soon as possible because
>>> after May 24 changes do not propagate into Michal Lenc's
>>> thesis text which can be alternative and more in depth
>>> documentation and analysis than what fits into official
>>> RTEMS one. The full document has already 47 pages and
>>> 34 of the actual text without content and appendices.
>>> Document includes benchmarks under RTEMS load by HTTP
>>> traffic, priority inversion prevention confirmation
>>> by measurements with performance data etc.
>>> It will be published on CTU in May or June
>>>    https://dspace.cvut.cz/
>>> and links will be added to
>>>    https://canbus.pages.fel.cvut.cz/
>>> same as for much shorter iCC article and presentation.
>>>
>>
>> Code review without patches or a review system is always a bit more
>> effort because there is nothing to add comments directly. It seems
>> that I can't register on the gitlab instance that you provided. So
>> let's try it here.
>>
>> I'll mainly take a look at the headers because they define the
>> interface. That's the most important part if you ask me. Bugs in the
>> code should be fixable later.
>>
>> I'll try to categorize my comments a bit. If it has a *Style* or
>> *Typo* in front of it, you can just ignore it. It's not really
>> important. It's just something that I noted while reading through the
>> code. *Question* or *Note* are more important.
>>
>> And please note: You know CAN a lot better than me. So quite possible
>> that I don't see a lot of stuff and that I might have some odd odd
>> questions.
>>
>>
>> ### First the ones that I plan to more or less ignore so that I can
>> concentrate on the important parts:
>>
>> Test or demo apps. So most likely not relevant for a review:
>>
>> ./rtems_can_test/can_test.h
>> ./rtems_can_test/app_def.h
>> ./rtems_can_test/system.h
>> ./rtems_can_test/can_register.h
>> ./zynq_template/app_def.h
>> ./zynq_template/zynq_reg.h
>> ./zynq_template/system.h
>>
>> Seems to be left over from some tests:
>>
>> ./lib/libbar/bar.h
>>
>> Driver specific files. I think these are not that high priority either:
>>
>> ./lib/candrv/ctucanfd/ctucanfd_txb.h
>> ./lib/candrv/ctucanfd/ctucanfd_kframe.h
>> ./lib/candrv/ctucanfd/ctucanfd_kregs.h
>> ./lib/candrv/dev/can/ctucanfd.h
>>
>>
>> ### Now the more important ones: The interfaces
>>
>> #### ./lib/candrv/dev/can/can.h
>>
>> *Style*: I would suggest to group defines a bit more. You already
>> used prefixes like RTEMS_CAN_QUEUE_* which is great. You can improve
>> that a bit more if you use Doxygen "@name" and "@{" ... "@}". For an
>> example take a look at
>>
>> https://gitlab.rtems.org/rtems/rtos/rtems/-/blob/main/cpukit/include/dev/i2c/i2c.h?ref_type=heads#L80
>>
>>
>> Which leads to a group in the doxygen output:
>>
>> https://docs.rtems.org/doxygen/branches/master/cpukit_2include_2dev_2i2c_2i2c_8h.html
>>
>>
>> The same is true for some other defines in other files. I won't
>> mention it every time.
>>
>>
>> *Question*: Why do you prefix some defines with RTEMS (like
>> RTEMS_CAN_CHIP_MODE) and others don't have that prefix (like
>> CAN_CTRLMODE_LOOPBACK)? The same is true for some other defines in
>> other files. I won't mention it every time.
>>
>>
>> *Style*: Sometimes you use Doxygen @brief. Sometimes \ref. I think it
>> works, but it's a bit odd.
>>
>>
>> *Question*: You have ioctls like RTEMS_CAN_DISCARD_QUEUES. According
>> to the description, that ioctl has a parameter. Why is it an _IO and
>> not an _IOW? The same is true for some more of the _IO ioctls.
>>
>>
>> *Typo*: The description of RTEMS_CAN_GET_BITTIMING has a "geets" in
>> it's comment.
>>
>>
>> *Note*: struct can_chip_ops doesn't have a description. I'm not
>> entirely sure what every member should do. For example:
>> check_bittiming: From the name I would expect that it only checks a
>> bit timing. From the ctucanfd.c it also sets a bit timing. I think it
>> would be good if you would add short descriptions here like "Check
>> and set a bittiming. Returns 0 on success or negated errno on error."
>>
>>
>> *Detail*: can_bus_register(...) and can_bitrate2bittiming are missing
>> a description. If they are a public interface, it would be good if
>> they would have one.
>>
>>
>> #### ./lib/candrv/dev/can/can-frame.h
>>
>> *Detail*: Description of the can_frame_header. You have field
>> descriptions like "This member holds the CAN ID value". My first
>> thought was that it is some kind of address. But with taking a look
>> at the code, it seems that it is a bit mask that is combined out of
>> CAN_ERR_ID_* defines. If a field is expected to contain a certain
>> group of defines: Can you add a note regarding that?
>>
>>
>> *Style*: You have a group of defines like CAN_ERR_*_DATA_BYTE. On the
>> first glance, I thought it would be the same group as CAN_ERR_ID_*. I
>> think I would have used CAN_ERR_DATA_BYTE_* instead. Of course that's
>> a style question and there are always good arguments for any style.
>> Again: A doxygen group using @{ and @} might achieve the same.
>>
>>
>> *Typo*: You have a CAN_ERR_PROT_LOC_DARA_BYTE instead of ..._DATA_BYTE.
>>
>>
>> *Question*: There are a lot of defines in can-frame.h like
>> CAN_ERR_PROT_LOC_DATA or CAN_ERR_TRX_UNSPEC. Is it clear for someone
>> more used to CAN, how to use these? Or would a description like
>> "Possible values for the Byte xyz in a can message" help?
>>
>>
>> #### ./lib/candrv/dev/can/can-filter.h
>>
>> *Detail*: Again: I'm not happy with the descriptions of the fields of
>> the structure. A field "flags" that is described as "This member
>> holds CAN flags" isn't really helpful. Which values can I assign to
>> that field? Is it a bit mask? Is it a field defined according to some
>> standard? In that case even a "Holds standard CAN flags" would be
>> useful because then I know that I just have to take a look at any CAN
>> documentation.
>>
>>
>> #### ./lib/candrv/dev/can/can-devcommon.h
>>
>> OK.
>>
>>
>> #### ./lib/candrv/dev/can/can-helpers.h
>>
>> *Note*: I don't like global defines like MAX_MSGOBJS without a
>> prefix. That's polluting the name space. Is there a reason that it
>> doesn't have the CAN or RTEMS_CAN prefix like all other defines?
>>
>> Similar: There are defines like "BIT". Is there a reason for using
>> such a generic name? If it (for example) helps porting existing
>> drivers from another stack, that's great. Otherwise, I don't like
>> these names.
>>
>> Some more in this file are: len2dlc, GENMASK, FIELD_PREP, FIELD_GET
>>
>>
>> Now I'm running out of time. I'll try to take a look at the following
>> files later or tomorrow:
>>
>
> Like promised:
>
> #### ./lib/candrv/dev/can/can-queue.h
>
> *Note*: The doxygen documentation of struct canqueue_slot_t didn't
> work as expected: You used for example @next to describe the "next"
> field. That clearly didn't work:
> https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/structcanque__slot__t.html
>
>
> *Question*: You use the Atomic_Uint from rtems/score/atomic.h for the
> slot_flags. For new code, I would suggest using the atomic_uint from
> stdatomic.h instead (C11). You have included that file already, so it
> shouldn't be a problem.
>
>
> *Question*: Why is string.h included in that header? I don't see any
> str*, mem* or stp* functions used.
>
>
> *Question*: Some of the functions have a bit of a short description.
> For example the canqueue_filter_match: The brief description basically
> just tells me exactly what the name of the function already told me.
> But how and in what situation would I use that function? How do these
> filters work? I can't tell that easily from the description or from
> the implementation of the function. Is it even thought as an interface
> that a user (in this case: someone writing a driver or an application)
> has to understand?
>
> Maybe I have a basic problem here: Which headers are (more or less)
> public ones (used to write drivers and applications) and which ones
> are internal only? Or in other words: Which headers will be installed?
>
> For this file, I strongly suspect that it is not user-facing. You have
> a lot of undocumented functions in it (like canqueue_next_inedge or
> canque_for_each_inedge).
>
> The files that are really relevant for review at the current point are
> mainly the ones that a user (again: someone writing a driver or an
> application) can see.
>
> If it is a public facing header: Did I miss some general description
> how a filter works and how a user should use it? It's quite possible
> that I missed that. I'm still only scratching the surface of your work
> at the moment.
>
>
> #### ./lib/candrv/dev/can/can-stats.h
>
> Looks OK.
>
> #### ./lib/candrv/dev/can/can-virtual.h
>
> OK.
>
> #### ./lib/candrv/dev/can/can-bittiming.h
>
> *Detail*: can_bittiming_const has a name with a fixed 32 char length.
> In ctucanfd.c you initialize that with a constant string. Is there
> some reason to have a fixed length string in RAM instead of using a
> pointer to a constant string that can have an arbitrary length and can
> be (for example) in the Flash?
>
>
> Best regards
>
> Christian
>
>>
>> Best regards
>>
>> Christian
>>
>>> Best wishes,
>>>
>>>                  Pavel
>>> -- 
>>>                  Pavel Pisa
>>>
>>>      phone:      +420 603531357
>>>      e-mail:     pisa at cmp.felk.cvut.cz
>>>      Department of Control Engineering FEE CVUT
>>>      Karlovo namesti 13, 121 35, Prague 2
>>>      university: http://control.fel.cvut.cz/
>>>      personal:   http://cmp.felk.cvut.cz/~pisa
>>>      company:    https://pikron.com/ PiKRON s.r.o.
>>>      Kankovskeho 1235, 182 00 Praha 8, Czech Republic
>>>      projects:   https://www.openhub.net/accounts/ppisa
>>>      social:     https://social.kernel.org/ppisa
>>>      CAN related:http://canbus.pages.fel.cvut.cz/
>>>      RISC-V education: https://comparch.edu.cvut.cz/
>>>      Open Technologies Research Education and Exchange Services
>>>      https://gitlab.fel.cvut.cz/otrees/org/-/wikis/home
>>
>


More information about the devel mailing list