[RFC] generic CAN/CAN FD susbsytem for RTEMS from scratch - online documentation

Pavel Pisa ppisa4lists at pikron.com
Mon May 6 09:27:30 UTC 2024


Dear Christian,

On Tuesday 30 of April 2024 08:40:43 Christian MAUDERER wrote:
> > For others, code under review hosted in CTU university GitLab
> > server
> >    https://gitlab.fel.cvut.cz/otrees/rtems/rtems-canfd
> > Documentation 
> >    https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/can/can-html/can.html
> >    https://otrees.pages.fel.cvut.cz/rtems/rtems-canfd/doc/doxygen/html/index.html
> >
> > Main developer behind extension to CAN FD and switch to RTEMS
> > is Michal Lenc.
> >
> > The intention is to (hopefully) reach state when it meets criteria
> > to mainlining int RTEMS CPU kit under
> >
> >    cpukit/dev/can
...
> > I agree, that it is compromise. But adding yet another file descriptor
> > like multiplexor for queues to each file descriptor seems to me as
> > too much complexity. But it can be added. even later as IOCTL to remove
> > individual queues based on CAN ID matches or queues IDs if create
> > is modified to return internal queue IDs...
>
> I somehow missed that you can open the device multiple times and get
> independent queues. With that, it's completely OK and should be flexible
> enough for most applications.
>
> It's great that you already have put some thought into how it could be
> extended later if some application needs more flexibility.
...
> >> Did you check with
> >> some other hardware controller, whether the whole structures / defines
> >> / flags close to the hardware do work well for other controllers too?
> >
> > The code/concept is based on my previous LinCAN and OrtCAN work
> >
> > https://ortcan.sourceforge.net/lincan/
...
> I didn't want to doubt your competence. Like I said it's some trap that
> I have fallen into often enough myself (like when guiding Prashanths
> GSoC project). But it's clear that you have put a lot of thought into
> that. So I would expect that there shouldn't be much trouble with most
> controllers. Maybe except for the ones where a semiconductor vendor
> thought it would be a good idea to create a completely different
> concept. But these are always difficult.

I agree with discussion and searching for hard arguments.

The solution is compromise and in general CAN bus concept
is optimized for direct replacement of wires in car
going between distinc units and its use as general
communication solution has some difficulties and requires
some compromises.

For small devices with predefined purpose and Autosar,
it is ideal to allocate for each CAN ID (wire signal)
to be sent one communication object on the controller.
Same for each received signal value or their set in the
single frame. The most controllers are equipped by filters
and mechanism to do so including selection of the
Tx message object for physical bus-link arbitration
according to the priority. Then sending side updates
signal value in corresponding Tx object and receiving
side sees most actual one usually on the best effort basis,
older unread frames are overwritten by updated value.

But even in simple ECU, there are obstacles to use this
principle in all kind of the communication. CAN bus is used
for firmware updates and general configuration. In this
case, the reliable delivery of all messages with given
CAN ID is required because whole sequence has to be
received and processed and the state evolution is associated
to the sequence. If a single message is lost, then all
data are unusable. Because sequence requires exact ordering
it is typical that only single Tx object is used. On Rx
side there can be problem to capture all frames without
overwrite by single Rx object so some controllers ad FIFO
which can be attached to each object or some mechanism
how to allocate more Rx objects and pass them to the user
in FIFO order.

That works for small ECUs with single purpose firmware.
But on general purpose operating system which should
allow even complete monitoring of the CAN bus, allows
dynamically started applications and even whole virtual
CAN/CANopen nodes, allocation the controller Tx/Rx message
objects for each specific purpose is impossible.

That is why all generic CAN subsystems which I know
(CAN4Linux, LinCAN, SockteCAN, NuttX char device CAN,
windows Peak's drivers etc.) define API based on
opening driver and presenting received messages
in FIFO order to application (with options for software
filtering but usually not propagated to controller,
HW - LinCAN has some option to union user FIFOs to
mask and ID propagated to HW, but you usually end with
fully end with need to receve all anyway and it has not been
used at the end). The Tx FIFO order is required for messages
with same ID or even sometimes between same stream of mesages
even wit altering ID for correct realization of some higher
level protocols.

The result is that even on hardware equipped with multiple
Tx objects but without special Tx FIFO order preserving
cyclic queue only single Tx object is used to realize
transmission of all messages, for example SocketCAN on
XCAN controller. So only part of the CAN bus media
badwidth can be utilized by single node. May it be, it is sometimes
a luck, because CAN IDs are not correctly allocated according
to priority even on cars critical subsystems. On the Rx side original
buffers approach is hard to use in order preserving FIFO concept,
but the most of today controllers add some option to keep order
and leave processing and distribution on software side.
See evolution from CCAN to DCAN to overcome that problem.
We have even made LinCAN for CCAN many many years ago
which somehow kept required properties but it was headache.

So back to generic OS can interfaces, all I know are FIFO(s)
based. Most of them keep strict FIFO order on Tx side
which results in HoL (head-of-the-line) blocking and priority
inversion on bus loaded by middle priority from other node.

That is why SocketCAN adds alloc_candev_mqs (multiple-queues) alternative
for drivers

https://elixir.bootlin.com/linux/latest/source/drivers/net/can/dev/dev.c#L249

but as I know, no mainline kernel driver is using that.
We have done some work to research and even a little extend
Linux networking QoS subsystem to solve buffer bloat by old
messages for traffic requiring best effort (most up to date
data for control) for given IDs and to limit badwidth
of others or virtual guests connected through QEMU to
physical bus etc. may years ago at time when multi-queue
has not been available on Linux side. I have long time plan
to extend CTU CAN FD mainline Linux driver for this support
and probably to be the first example how to overcome HoL/priority
inversion in Linux CAN subsystem. It has been planned in original
LinCAN before SoketCAN and it is now implemented in proposed
RTEMS CAN/FD framework where application can setup multiple
queues even for single open instance with different Tx priority
class and when used and mapped correctly to CAN IDs, it can
prevent priority inversion. It is not generic, because it is
quite expensive for deeper FIFOs and even mutual order of
Tx messages has to be preserved for many protocols as discussed
earlier. CTU CAN FD IP core interface to software has been architected
by me to allow maximal utilization of the Tx buffers and their
reallocation when needed for higher priority message.
Wait for DTP processing and publication of our international CAN
Conference 2024 article or come and meet next week in Baden-Baden

  https://www.can-cia.org/icc/

There are two branches of the thought from this point

1) how it maps to other controllers

For these equipped by single Tx object only (i.e. SJA1000),
it maps well because attempt to repeat Tx and arbitration
can be disabled when higher priority queue becomes ready
and our CAN infrastructure allows to push back lower
priority message and schedule higher one to be sent.

For more complex one, if they do not allow to control Tx objects
order then only single Tx object can be used. Bad, link underutilization,
but it is what is standard in SocketCAN and other CAN solutions
for general purpose operating systems today. All controllers
which I know allows to stop Tx attempt repeat and I hope to
seen at all option t check if the latest attempt has been
successful or not. So newt RTEMS CAN can use them same
as on SJA1000. On Rx side, most have FIFO preserving
option to use multiple buffers. Sometimes partially
broken, burdened by erratas etc. (like iMX RT where
we overcome these problems in NuttX drivers).
When number of Tx priority classes is limited (for proposed
system by default 3 but compile time configurable) then
we can allocate one Tx buffer for each class, easy and
preserves HoL priority inversion even on simple controllers.
If there is option to order Tx according to the buffer
index in the controller, then there is option for a little
more performant solution when multiple Tx buffers are allocated
for each class and they are sequentially filled till highest
allocated buffer index is filled. Then there is some gap till
all these buffers in given priority are sent because
cyclic filling of the minimal index would result in reordering
with possible break of some protocol requirements.
Some controllers allows to attach DMA realized FIFOs to more
Tx objects, in such case it would map to proposed design well
too. Some newer controllers adds local priority bits above
CAN ID ones (i.e. new NXP FlexCAN). This could allow cyclic
use of some Tx objects/buffers similar to CTU CAN FD.
There will be problems because multiple Tx buffers priorities
are not reachable by single atomic operation like in CTU CAN FD
case. But I have some idea how to implement sequential
updates to ensure order in the class. There would be problem,
that most controllers do not allow to update this information
on the objects participating actively in arbitration. So it would
lead to much more acrobation between eggs and some gap time,
where none message is offered in the link arbitration even that
there are pending user requests will be inevitable in some
scenarios after some number of messages sent. That cannot
be on the bus side worse that considering fixed order according
to index. May be, it can be found that overhead does not worth
that. But we preserve API in variants in all cases...

2) use of the CAN bus in applications requiring maximal bus
transparency with minimal latency and SW load. This is
totally opposite of the general CAN bus subsystem for
general purpose RTOS. The API in this case should allocated
Tx and Rx controller objects for the individual purposes/CAN IDs.
Rx side SW processing can be considered as alternative and proposed
framework allows to setup queues, but it has overhead and under
extreme load it can lost some messages if HW is not performant enough.
On Tx side it is even more problematic.

But if this type of use of RTEMS for example for Autosar or Simulink
generated code is considered then it is possible to extend actual
proposed API by IOCTLs which allows to reserve some controller
objects for specific purposes and allows to access them directly
for minimal overhead and use under direct application control or attach
separated controller side "canque_ends_dev_t" to such objects and
propagate them to some clients to standard CAN read and write API.

So I think that the proposed framework provides what is expected
bu most of general purpose CAN/CAN FD framework users, tries to
perpare a little even for come of CAN XL, solves problems which
may be practically unsolved by all other generic approaches still.
And we have some clue how to extend support for most/all other
controllers and even some open doors to offer even ECU style
API for applications which benefit from direct controller
buffers use/allocation which is possible on controllers
with abundant number of buffers (not case of SJA1000
and very limited on CTU CAN FD - max 8 can be configured
to silicon under actual registers map).  

I understand that the text is long but you have asked for
it in the fact and I provide complete thought dump
to analyes it.

I would be happy if you and or others find time to look
into actual code implementation to identify what could
be issue for mainlining as soon as possible because
after May 24 changes do not propagate into Michal Lenc's
thesis text which can be alternative and more in depth
documentation and analysis than what fits into official
RTEMS one. The full document has already 47 pages and
34 of the actual text without content and appendices.
Document includes benchmarks under RTEMS load by HTTP
traffic, priority inversion prevention confirmation
by measurements with performance data etc.
It will be published on CTU in May or June
  https://dspace.cvut.cz/
and links will be added to
  https://canbus.pages.fel.cvut.cz/
same as for much shorter iCC article and presentation.

Best wishes,

                Pavel
--
                Pavel Pisa

    phone:      +420 603531357
    e-mail:     pisa at cmp.felk.cvut.cz
    Department of Control Engineering FEE CVUT
    Karlovo namesti 13, 121 35, Prague 2
    university: http://control.fel.cvut.cz/
    personal:   http://cmp.felk.cvut.cz/~pisa
    company:    https://pikron.com/ PiKRON s.r.o.
    Kankovskeho 1235, 182 00 Praha 8, Czech Republic
    projects:   https://www.openhub.net/accounts/ppisa
    social:     https://social.kernel.org/ppisa
    CAN related:http://canbus.pages.fel.cvut.cz/
    RISC-V education: https://comparch.edu.cvut.cz/
    Open Technologies Research Education and Exchange Services
    https://gitlab.fel.cvut.cz/otrees/org/-/wikis/home


More information about the devel mailing list