[RFC] generic CAN/CAN FD susbsytem for RTEMS from scratch - online documentation

Mon Apr 29 19:23:58 UTC 2024

Dear Christian,

thanks a lot for finding time to read through documentation.

On Monday 29 of April 2024 10:56:29 Christian MAUDERER wrote:
> it's quite a big work. So I've started to read through the
> documentation to get an overview.

For others, code under review hosted in CTU university GitLab
server

  https://gitlab.fel.cvut.cz/otrees/rtems/rtems-canfd

Documentation

  https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/can/can-html/can.html
  https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/index.html

Main developer behind extension to CAN FD and switch to RTEMS
is Michal Lenc.

The intention is to (hopefully) reach state when it meets criteria
to mainlining int RTEMS CPU kit under

  cpukit/dev/can

We plan to prefix all public functions by rtems_ prefix as the
final step after review and acceptance for mainline to not
pollute applications namespace. User visible structures
types for IOCTLs and farmes are planned to stay without
prefix for readability. They can be hidden by not including
given header file.

> Some questions: 
> Do I understand that correctly, that the only user-facing interface is
> via the "/dev/can*" files. There is no separate interface for special
> operations, right? That's completely OK for me. I just want to make
> sure that I understand it correctly.

Yes, it is the goal to use only character device as the only intreface
to the user application. LinCAN worked in standard POSIX environment
with MMU etc. and even on RTEMS we consider user and kernel
space as fully separated and only read, write, IOCTL exchange
data controlled way.

There are more functions which are intended for drivers developers
and for registration from BSP. 

> Chapter "1.1.1.1: Managing Queues": I assume the limitation that
> RTEMS_CAN_DISCARD_QUEUES removes all queues and not only selected ones
> is a limitation due to the nature of the ioctl interface or the driver
> capabilities? It could be a bit of an annoying limitation in an
> application that dynamically wants to register or unregister queues.

The single chip can be opened multiple times, each open instance
created their own canque_ends_user_t

https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/structcanque__ends__user__t.html

This structure allows to connect FIFO queues (canque_edge_t) at any time
in any direction including echo to itself or some intermediate node
to send messages to more links for redundancy etc. 

https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/structcanque__edge__t.html

That is internal implementation which is intended to be really flexible
and with minimal locking intervals - only during move to next queue
during iterations.

But we are trying to keep external interface reasonably simple, so we consider
only queues from single user (file-descriptor) to single chip corresponding 
to /dev/can during chosen during open for now.

The rationale for discard all only is that we if we allow manipulation of 
individual queues we cannot identify queues by pointers. We take as forbidden
to expose kernel pointer addresses of canque_edge_t. It can be resolved
by assigning some unique numbers for each edge created from given user
ends and returning these to user from RTEMS_CAN_CREATE_QUEUE. The initial
default connection can have ID 0. Then it would be easy iterate over
all queues going to/from given user and remove only that queue where
the specified ID matches. Other option is to limit which queues will
be removed based on CAN ID and mask... but there can be problem to uniquely
match such queue etc...

So the compromise is taken. You can remove all queues to given open instance
and then add queues one by one as you want with controlled CAN ID
filters and priority.

If you need to create dynamically some application which needs specific
CAN IDs and priorities and then you expect that it should not interact
with network any more, then you can open /dev/canX again and you have
private queues pool. When the file representation is closed then all
these queues will be removed. That all without any influence
to other open instances to same or other CAN device {if we do not
consider system load). 

I agree, that it is compromise. But adding yet another file descriptor
like multiplexor for queues to each file descriptor seems to me as
too much complexity. But it can be added. even later as IOCTL to remove
individual queues based on CAN ID matches or queues IDs if create
is modified to return internal queue IDs... 

> Chapter "1.1.1.3: Setting Mode": These look quite controller specific.
> If I want to add a controller that has a new mode: Would I just add a
> new flag? What happens if we reach 33 flags? Would an array or maybe a
> structure with a single uint32_t field be a better choice? I assume
> that if a controller doesn't support a mode, (like setting FD on a
> non-FD controller), the IOCTL will just return an error?

I hope that these modes should be mostly controller independent.
Mode roughly correspond to same definitions found in Linux SocketCAN.
There is list of our CAN_CTRLMODE_x

https://gitlab.fel.cvut.cz/otrees/rtems/rtems-canfd/-/blob/master/lib/candrv/dev/can/can.h?ref_type=heads#L98

May it be we should do review and remove some.
I.e. CAN_CTRLMODE_3_SAMPLES fits better into bittiming... 

> Chapter "1.1.3: Frame Transmission" and "1.1.4: Frame Reception": The
> last argument of "write" is a count. If I see a write, my first guess
> is that I have to call it like:
>
>    struct can_frame frames[10] = {... /* some values */ ...};
>    ssize_t rv = write( fd, frames, sizeof(frames) );

This is intended to be used such way but we try to consider even CAN XL
for future and require to write 2kB block from user space into kernel
space even for 8 bytes CAN message is too big overhead.
You can even limit size to store FD frames in application if you know
that you do not exceed some data size. So it is possible to specify shorter
size. Same for RX. But we keep one read/write to one CAN message
correspondence. There has been some advantage of original LinCAN
driver that more messages could be received or sent by single fort and back
supervisor mode switch. But with FD and even XL frames in future
and cheap "system" calls on RTEMS, I consider such optimization
for multiple messages for single system call as too errors prone
and complicating things... So one message, one read, write call. 

> But from taking a look at the tests in the repository, the count
> is calculated by can_framelen(). Is it possible to send or receive
> multiple CAN frames using write or read?

Not, not with single call

> Or is it always a single 
> frame?

Yes

> What happens if I pass a wrong length?

If you attempt to send mesage with data len where header+data do not fit
into write specified size bytes, -EMSGSIZE is returned.

> Do I send wrong data, 
> crash the system or the whole CAN bus or do I just get an error?

Only error and nothing is send.

> Can you make that more clear in the documentation?

For the read you receive -EMSGSIZE and message is not
disposed from queue, so you can reallocate buffer
and attempt read again. This can be again big gain for
CAN XL... You can receive short messages into some smaller 
user space buffers and XL to others etc... 

I agree that it should be documented more explicitly
in manual.

> Some details regarding "struct can_chip":
>
> * There is a pointer called "type". I would use a "const char *" for
>    that. I expect that stuff like names will never change and having
>    them constant allows to use a pointer to a flash memory area.

ACK

> * You have an "int irq". That's not fully compatible to the
>    rtems_vector_number which is an uint32_t (at the moment).

OK, we should adjust that to match

> Regarding the CAN drivers: Do I see it correctly, that currently only
> a single (real) device is supported (ctucanfd)?

Yes

> Did you check with 
> some other hardware controller, whether the whole structures / defines
> / flags close to the hardware do work well for other controllers too?

The code/concept is based on my previous LinCAN and OrtCAN work

https://ortcan.sourceforge.net/lincan/

Adding SJA1000 support should be easy as the code is part of LinCAN
and have been used and modified even for NXP LPC controllers.
Whole infrastructure should be still portable to more systems
when locking, tasks and interrupts are rewritten. On the other
hand, we have removed LinCAN system adaptation layer to directly
use RTEMS and be readable for maintainers.

> Like the mode flags from 1.1.1.3. It doesn't really matter which other
> controller. From my own attempts to write a driver stack, I just know
> that sometimes you make assumptions based on one controller that are
> hard to implement on another one. Usually it's not even necessary to
> really add a second controller. Just skimming through the manual can
> be really helpful. On the other hand, the Doxygen documentation
> mentions, that the concept is based on LinCAN. So maybe that already
> helps to avoid that trap?

I cannot 100% guarantee that all is thought around and around.

I have experience with XCAN, LPC CAN, SJA1000, OpenCores SJA on NuttX
in ESP32C3 (we have contributed driver into NuttX mainline). I have
provided common bittrimming to SocketCAN long ago, Michal Lenc implemented
SocketCAN NXP imxRT driver for NuttX (mainlined) already. I have
looked even into Zephyr and some Autosar concepts etc... We have
LinCAN variant including extension for USB based devices
and for MCP5200 MSCAN. We have helped Honeywell with correcting
a little RT behavior of SPI connected MCP2515 horror when they
used it on Raspberry Pi with Simulink generated code on some
experimental automotive ECU autotuning system. iMX6 with FlexCAN
is used in Elektroline.cz, where I provide some consultations.

I have been architect of rapid prototyping platforms developed
for Porsche Enginering Services based on TMS570LS3137 and we have
implemented there TTCAN like synchronous messages exchange there
with no timing and communication schedule execution dependency on CPU.
The actual proposed CAN subsystem proposal is not optimized
for such use, but is is very special and needs intercation
of application with lower levels or provide API to setup schedule
and update send data. I can think how to pas that by set of additional
IOCTLs. But RTEMS need some common CAN/CAN FD stack with
common functionality found on other systems.

So all in all I hope I have some idea what could appear in
CAN controllers ZOO... But yes, we can overlook something
and that is why I keep Oliver Hartkopp (SocketCAN architect)
in the loop to be opponent to our decision if he thinks that
our choices could lead to problems. Even decision/compromise
if SocketCAN (and LibBSD) dependency or simpler approach should
be our direction on simpler operating system as RTEMS is, has been
discussed with him and forwarded to the list

https://lists.rtems.org/pipermail/devel/2022-June/071972.html

when Prashanth S has started his GSoC. That effort did not solve
CAN driver design even that I have provided input into that GSOC
project how to develop stack in the LinCAN queues direction or other
reasonably usable FIFOs. The implemented buffers has been mess.
But I hope that some part of his BeagleBoneBlack D-CAN support
can be reused for D-CAN implementation based on our effort one day.

> Regarding the device names "/dev/can0" and similar: Currently that
> seems to be a fixed scheme, correct?

Not really, the CAN subsystem provides function 

  int can_bus_register( struct can_bus *bus, const char *bus_path );

so it is on the user, BSP integrator etc to decide naming scheme.
Some scheme is not part of the  libcandrv.a. We use simple
unique numbers generator in the test application during interfaces
registration

 Atomic_Uint idx = _Atomic_Fetch_add_uint(
    &candev_sqn,
    1,
    ATOMIC_ORDER_SEQ_CST
  );

> In my experience, sometimes it 
> can be useful to use longer names instead. For example
> "/dev/can_ctufd0". That's especially interesting if a system is
> initialized dynamically.

Understand

> For example if you have a USB to CAN adapter 
> that is enumerated more or less randomly during startup. Is that
> supported or are there some fixed assumptions that a can device is
> always called "/dev/canX"?

The /dev/canX is standard and simplest scheme used on many systems.
But internals of the project do not prevent to use PCI topological
address as part of the pah name.

Best wishes,

                Pavel

PS: Michal Lenc has succeed with this weekend experiment to add support
    of proposed RTEMS CAN interface into evolving yet toy Rapid Prototyping
    tool
      https://github.com/robertobucher/pysimCoder
    on base of experimental RTEMS target templates.

PS2: I have tested to build complete OrtCAN CANopen code against proposed
   CAN subsystem as well. It builds but I need to prepare top level
   application to link it successfully.

--
                Pavel Pisa

    phone:      +420 603531357
    e-mail:     pisa at cmp.felk.cvut.cz
    Department of Control Engineering FEE CVUT
    Karlovo namesti 13, 121 35, Prague 2
    university: http://control.fel.cvut.cz/
    personal:   http://cmp.felk.cvut.cz/~pisa
    company:    https://pikron.com/ PiKRON s.r.o.
    Kankovskeho 1235, 182 00 Praha 8, Czech Republic
    projects:   https://www.openhub.net/accounts/ppisa
    social:     https://social.kernel.org/ppisa
    CAN related:http://canbus.pages.fel.cvut.cz/
    RISC-V education: https://comparch.edu.cvut.cz/
    Open Technologies Research Education and Exchange Services
    https://gitlab.fel.cvut.cz/otrees/org/-/wikis/home