[PATCH rtems-libbsd 1/2] racoon/session: Honor file descriptor maximum
Christian MAUDERER
christian.mauderer at embedded-brains.de
Mon Mar 29 07:10:16 UTC 2021
Hello Chris,
Am 28.03.21 um 04:43 schrieb Chris Johns:
>
>
> On 22/3/21 7:45 pm, Christian MAUDERER wrote:
>> Hello Chris,
>>
>> Am 19.03.21 um 02:11 schrieb Chris Johns:
>>> On 3/3/21 7:41 pm, Christian MAUDERER wrote:
>>>> Hello Chris,
>>>>
>>>> Am 03.03.21 um 02:17 schrieb Chris Johns:
>>>>> On 2/3/21 7:26 pm, Christian MAUDERER wrote:
>>>>>> Hello Chris,
>>>>>>
>>>>>> Am 02.03.21 um 01:03 schrieb Chris Johns:
>>>>>>> On 1/3/21 7:24 pm, Christian MAUDERER wrote:
>>>>>>>> Hello Chris,
>>>>>>>>
>>>>>>>> thanks for the review.
>>>>>>>>
>>>>>>>> Am 26.02.21 um 19:04 schrieb Chris Johns:
>>>>>>>>> On 26/2/21 2:01 am, Christian Mauderer wrote:
>>>>>>>>>> Dynamically allocate a big enough file descriptor set for select(). A
>>>>>>>>>> better solution would be to use kqueue() instead of select().
>>>>>>>>>> ---
>>>>>>>>>> .../racoon/rtems-bsd-racoon-session-data.h | 6 +--
>>>>>>>>>> ipsec-tools/src/racoon/session.c | 40
>>>>>>>>>> +++++++++++++++++++
>>>>>>>>>> 2 files changed, 43 insertions(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/ipsec-tools/src/racoon/rtems-bsd-racoon-session-data.h
>>>>>>>>>> b/ipsec-tools/src/racoon/rtems-bsd-racoon-session-data.h
>>>>>>>>>> index b869a1518..196107a35 100644
>>>>>>>>>> --- a/ipsec-tools/src/racoon/rtems-bsd-racoon-session-data.h
>>>>>>>>>> +++ b/ipsec-tools/src/racoon/rtems-bsd-racoon-session-data.h
>>>>>>>>>> @@ -2,11 +2,11 @@
>>>>>>>>>> #include <rtems/linkersets.h>
>>>>>>>>>> #include "rtems-bsd-racoon-data.h"
>>>>>>>>>> /* session.c */
>>>>>>>>>> -RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static fd_set active_mask);
>>>>>>>>>> -RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static fd_set preset_mask);
>>>>>>>>>> +RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static _types_fd_set
>>>>>>>>>> *allocated_active_mask);
>>>>>>>>>> +RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static _types_fd_set
>>>>>>>>>> *allocated_preset_mask);
>>>>>>>>>> RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static int nfds);
>>>>>>>>>> RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static int signals[]);
>>>>>>>>>> RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static sig_atomic_t
>>>>>>>>>> volatile
>>>>>>>>>> volatile sigreq[]);
>>>>>>>>>> -RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static struct fd_monitor
>>>>>>>>>> fd_monitors[]);
>>>>>>>>>> +RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static struct fd_monitor
>>>>>>>>>> *allocated_fd_monitors);
>>>>>>>>>> RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static struct
>>>>>>>>>> fd_monitor_list
>>>>>>>>>> fd_monitor_tree[]);
>>>>>>>>>> RTEMS_LINKER_RWSET_CONTENT(bsd_prog_racoon, static struct sched
>>>>>>>>>> scflushsa);
>>>>>>>>>> diff --git a/ipsec-tools/src/racoon/session.c
>>>>>>>>>> b/ipsec-tools/src/racoon/session.c
>>>>>>>>>> index 65124c15e..90120c761 100644
>>>>>>>>>> --- a/ipsec-tools/src/racoon/session.c
>>>>>>>>>> +++ b/ipsec-tools/src/racoon/session.c
>>>>>>>>>> @@ -65,6 +65,10 @@
>>>>>>>>>> #include <sys/stat.h>
>>>>>>>>>> #include <paths.h>
>>>>>>>>>> #include <err.h>
>>>>>>>>>> +#ifdef __rtems__
>>>>>>>>>> +#include <sys/param.h>
>>>>>>>>>> +#include <rtems/libio_.h>
>>>>>>>>>> +#endif /* __rtems__ */
>>>>>>>>>> #include <netinet/in.h>
>>>>>>>>>> #include <resolv.h>
>>>>>>>>>> @@ -123,8 +127,16 @@ static void check_sigreq __P((void));
>>>>>>>>>> static void check_flushsa __P((void));
>>>>>>>>>> static int close_sockets __P((void));
>>>>>>>>>> +#ifndef __rtems__
>>>>>>>>>> static fd_set preset_mask, active_mask;
>>>>>>>>>> static struct fd_monitor fd_monitors[FD_SETSIZE];
>>>>>>>>>> +#else /* __rtems__ */
>>>>>>>>>> +static fd_set *allocated_preset_mask, *allocated_active_mask;
>>>>>>>>>> +static struct fd_monitor *allocated_fd_monitors;
>>>>>>>>>> +#define preset_mask (*allocated_preset_mask)
>>>>>>>>>> +#define active_mask (*allocated_active_mask)
>>>>>>>>>> +#define fd_monitors (allocated_fd_monitors)
>>>>>>>>>> +#endif /* __rtems__ */
>>>>>>>>>> static TAILQ_HEAD(fd_monitor_list, fd_monitor)
>>>>>>>>>> fd_monitor_tree[NUM_PRIORITIES];
>>>>>>>>>> static int nfds = 0;
>>>>>>>>>> @@ -134,7 +146,11 @@ static struct sched scflushsa =
>>>>>>>>>> SCHED_INITIALIZER();
>>>>>>>>>> void
>>>>>>>>>> monitor_fd(int fd, int (*callback)(void *, int), void *ctx, int
>>>>>>>>>> priority)
>>>>>>>>>> {
>>>>>>>>>> +#ifndef __rtems__
>>>>>>>>>> if (fd < 0 || fd >= FD_SETSIZE) {
>>>>>>>>>> +#else /* __rtems__ */
>>>>>>>>>> + if (fd < 0 || fd >= rtems_libio_number_iops) {
>>>>>>>>>> +#endif /* __rtems__ */
>>>>>>>>>> plog(LLV_ERROR, LOCATION, NULL, "fd_set overrun");
>>>>>>>>>> exit(1);
>>>>>>>>>> }
>>>>>>>>>> @@ -158,7 +174,11 @@ monitor_fd(int fd, int (*callback)(void *, int), void
>>>>>>>>>> *ctx, int priority)
>>>>>>>>>> void
>>>>>>>>>> unmonitor_fd(int fd)
>>>>>>>>>> {
>>>>>>>>>> +#ifndef __rtems__
>>>>>>>>>> if (fd < 0 || fd >= FD_SETSIZE) {
>>>>>>>>>> +#else /* __rtems__ */
>>>>>>>>>> + if (fd < 0 || fd >= rtems_libio_number_iops) {
>>>>>>>>>> +#endif /* __rtems__ */
>>>>>>>>>> plog(LLV_ERROR, LOCATION, NULL, "fd_set overrun");
>>>>>>>>>> exit(1);
>>>>>>>>>> }
>>>>>>>>>> @@ -186,7 +206,22 @@ session(void)
>>>>>>>>>> struct fd_monitor *fdm;
>>>>>>>>>> nfds = 0;
>>>>>>>>>> +#ifndef __rtems__
>>>>>>>>>> FD_ZERO(&preset_mask);
>>>>>>>>>> +#else /* __rtems__ */
>>>>>>>>>> + allocated_preset_mask = calloc(sizeof(fd_set),
>>>>>>>>>> + howmany(rtems_libio_number_iops, sizeof(fd_set) * 8));
>>>>>>>>>
>>>>>>>>> Does `maxfiles` work here?
>>>>>>>>>
>>>>>>>>
>>>>>>>> To be honest: I'm not sure.
>>>>>>>>
>>>>>>>> According to the comment in file.h:
>>>>>>>>
>>>>>>>> extern int maxfiles; /* kernel limit on number of open files */
>>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> Sounds like it _can_ be the same as the maximum file number but doesn't
>>>>>>>> have to.
>>>>>>>
>>>>>>> I think we need to have them be the same value.
>>>>>>>
>>>>>>>> I didn't find where we implement it. It's declared as an extern int
>>>>>>>> maxfiles but
>>>>>>>> I didn't find any definition. I found it only in libbsd in
>>>>>>>> freebsd/sys/kern/uipc_socket.c where it is defined like follows:
>>>>>>>>
>>>>>>>> #define maxfiles rtems_libio_number_iops
>>>>>>>
>>>>>>> Ah OK. I knew it had been assigned somewhere and yes it looks like it is
>>>>>>> local
>>>>>>> to that file.
>>>>>>>
>>>>>>>>
>>>>>>>> So question is: Where and how is maxfiles defined?
>>>>>>>>
>>>>>>>
>>>>>>> I have provided a value in the rtemsbsd init file as part of the set og
>>>>>>> globals
>>>>>>> we need to maintained.
>>>>>>
>>>>>> Somehow I missed that. Where can I find it?
>>>>>
>>>>> Again sorry, I was in a rush and I was not clear. I have add this in my new
>>>>> changes for NFSv4 support and this is what I sorted out.
>>>>>
>>>>
>>>> Ah, OK. In that case I'll keep the rtems_libio_number_iops at the moment if it
>>>> is OK for you.
>>>
>>> Yes that is fine.
>>>
>>
>> Thanks.
>>
>>>>
>>>>>>
>>>>>>>
>>>>>>>>>> + if (allocated_preset_mask == NULL)
>>>>>>>>>> + errx(1, "failed to allocate preset_mask");
>>>>>>>>>> + allocated_active_mask = calloc(sizeof(fd_set),
>>>>>>>>>> + howmany(rtems_libio_number_iops, sizeof(fd_set) * 8));
>>>>>>>>>> + if (allocated_active_mask == NULL)
>>>>>>>>>> + errx(1, "failed to allocate active_mask");
>>>>>>>>>> + allocated_fd_monitors = calloc(
>>>>>>>>>> + rtems_libio_number_iops, sizeof(struct fd_monitor));
>>>>>>>>>> + if (allocated_fd_monitors == NULL)
>>>>>>>>>> + errx(1, "failed to allocate fd_monitors");
>>>>>>>>>
>>>>>>>>> At the core of this issue is the rotating fd allocation that we have in
>>>>>>>>> libio. A
>>>>>>>>> report from a FreeBSD machine I have is:
>>>>>>>>>
>>>>>>>>> $ sysctl -a | grep maxfiles
>>>>>>>>> kern.maxfiles: 1037243
>>>>>>>>> kern.maxfilesperproc: 933516
>>>>>>>>>
>>>>>>>>> I doubt a select process in FreeBSD needs a select array with that many
>>>>>>>>> bits. I
>>>>>>>>> have added similar code else where but I am wondering if this is really
>>>>>>>>> what we
>>>>>>>>> want to do. A side effect for us is the stack usage is not bounded and that
>>>>>>>>> is a
>>>>>>>>> problem. I think our newlib based max setting is too small.
>>>>>>>>
>>>>>>>> I think we have to distinguish between FreeBSD kernel space and user space.
>>>>>>>> If I
>>>>>>>> have seen it correctly, FreeBSD uses kqueues or maybe sometimes poll
>>>>>>>> instead of
>>>>>>>> select in kernel space most of the time. That avoids the problem with big
>>>>>>>> file
>>>>>>>> numbers.
>>>>>>>
>>>>>>> Yes kqueue is nice but we need to support existing code and this is a primary
>>>>>>> role libbsd needs to perform. The lack is signals in libbsd is another
>>>>>>> source of
>>>>>>> issue.
>>>>>>>
>>>>>>
>>>>>> I didn't want to say that we should not support select. I only wanted to say
>>>>>> that FreeBSD most likely just avoids that problem in kernel space so that they
>>>>>> don't have to support a select with 1037243 files. In user space they limit a
>>>>>> select to 1024 files.
>>>>>
>>>>> Yes they do. My suggestion of us using 1024 is FreeBSD applications are fine
>>>>> with this then they should be portable to RTEMS and libbsd and the need for us
>>>>> to patch code is removed.
>>>>>
>>>>
>>>> 1024 for a select is quite much for an embedded system. It means that we will
>>>> need 128 bytes of stack for each select call in the future. It's a quite big
>>>> change.
>>>
>>> The select call in LibBSD double this internally.
>>>
>>
>> Doesn't make it better.
>
> No it does not and I feel these hacks are just that hacks and average ones that
> need to be removed.
>
>>
>>>>
>>>> Another possibility would be 256 (32 bytes) like some other *BSD do.
>>>>
>>>> Maybe we should start a new discussion thread about the right FD_SETSIZE so that
>>>> more developers get involved?
>>>>
>>>>>>>> Default for FreeBSD seems to be a FD_SETSIZE of 1024. See the Notes
>>>>>>>> section in
>>>>>>>> the select man page:
>>>>>>>>
>>>>>>>> https://www.freebsd.org/cgi/man.cgi?query=select&sektion=2&manpath=freebsd-release-ports#end
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Yes.
>>>>>>>
>>>>>>>> I think select is used in user space applications most of the time. I would
>>>>>>>> guess that FreeBSD has some file number mapping between kernel and user
>>>>>>>> space
>>>>>>>> that would allow every application to open 1024 files.
>>>>>>>
>>>>>>> Yes. There are other checks and arrays in the kernel's select that handle the
>>>>>>> size of 1024 or smaller. We should align ourselves with FreeBSD. A select
>>>>>>> call
>>>>>>> uses a lot of stack.
>>>>>>>
>>>>>>
>>>>>> I have no problem if we align more with FreeBSD. But I'm not sure how much
>>>>>> effort that would be. And we have to be careful not to increase the memory
>>>>>> usage
>>>>>> for small targets too much.
>>>>>
>>>>> We need to find a path between that sort of space optimization and the
>>>>> functionality libbsd brings. The select call is costly in terms of stack
>>>>> size as
>>>>> the kern_select call has:
>>>>>
>>>>> /*
>>>>> * The magic 2048 here is chosen to be just enough for FD_SETSIZE
>>>>> * infds with the new FD_SETSIZE of 1024, and more than enough for
>>>>> * FD_SETSIZE infds, outfds and exceptfds with the old FD_SETSIZE
>>>>> * of 256.
>>>>> */
>>>>> fd_mask s_selbits[howmany(2048, NFDBITS)];
>>>>>
>>>>> I do not know if this means we need to be more concerned about the size of
>>>>> less.
>>>>>
>>>>
>>>> Hm. They could have used FD_SETSIZE instead of the magic value with a comment...
>>>
>>> It might pay to check the surrounding code a litte more before raising something
>>> like that on the FreeBSds hackers list. There are subtle optimisations happening
>>> to make this call perform.
>>
>> I didn't plan to rise it on the FreeBSD mailing list. If I plan to fight against
>> existing magic numbers I'll start in RTEMS before I fight them in FreeBSD ;-)
>
> Yes. My point is we are porting as clean FreeBSD code as we can have and this is
> what we inherit.
>
>>>> I think a big problem for RTEMS is that we have shared file numbers for all
>>>> processes. This makes 1024 a lot less compared to starting from 0 for every
>>>> process.
>>>
>>> We are a single process so I do not follow. We are no different to other
>>> processes. We have lived with a small file set size for years until libio was
>>> changed to cycle over the fd numbers
>>
>> We started to include libbsd with some applications that are more used to
>> desktop or server environments like wpa supplicant or racoon. These applications
>> originally didn't expect to run in a single process environment but have an own
>> process for each of them.
>>
>> I'm just not sure whether increasing FD_SETSIZE is the best solution for that
>> problem. It only moves it so that it does happen in fewer applications at the
>> cost that all applications that use select and can work with the current limit
>> will need more stack.
>
> You could be right maybe the issue is not specially there but in libio.
>
>> That means: I wouldn't object to such a patch but I don't want to be the one
>> creating and suggesting it.
>
> Sure, this is your choice to make. I have raised this issue with Sebastian and
> now you and both have supported this approach to fixing select.
Because, although the kind of fix is more a hack, it can solve the
problem for these use cases. They allow libbsd to be used regardless of
the fd_set size. Of course it has the downside that it increases the
maintenance effort for libbsd (like every patch does) and therefore it
is not an ideal solution.
>
> I have given this topic some more consideration and the FD size is just a rubber
> band. The reality is the FD set needs to always be the libio size because of the
> way our select and file system are implement and this not how select has been
> defined.
Agreed: The big problem is that the FD set _should_ be the same as libio
size. But at the same time it's used for a fixed type. If that type is
used in a structure in a library, we can't base it on one of our
CONFIGURE_* macros.
Theoretically we could try to somehow define it as a pointer (or head of
a list or simmilar) and allocate space as soon as someone adds something
to the list. But I'm not sure where we could free that kind of memory.
And I'm also sure that there are quite some edge cases that wouldn't
work with that.
> As things stand select is broken and this is a bug. We cannot redefine
> select so we need to live with how it works. I think backing out libio
> incrementing fd to catch fd reuse is seriously on the table. The side effects
> out way the benefits.
How much time do you think would be necessary for this kind of change?
This sounds like a complex but very self-contained toppic. Do you think
we could convince a (good) GSoC student to do that? Of course it would
need a thourough review at such a critical location and therefore would
be quite a bit of work for the mentor too.
>
> This exmple explains the bug. Set the FD set size to 1024 to match FreeBSD, it
> cannnot be any larger. Set the libio descriptors to 2048. Open and close a file
> 1025 times. Open a single socket and call select. Select will fail and we have
> only 1 extra fd on top of stdio ones.
>
> There are of course other issues with select such as no more than 64, or 1024 if
> changing the default, fd's being open but that is historical. If it is arugued
> this is an issue that should be resolved then it should also include the fixed
> size in the FreeBSD kernel and that being raised and resolved there. After that
> we have the lack of signal support in libbsd.
>
> I feel the change you have pushed is the result of looking under the hood and
> inspecting the implenmentation to avoid a select bug.
Yes, it is. It is very specific to the implementation of select.
>
>>>>>>>>> FYI I have a major set of changes to libbsd that enables FreeBSD descriptor
>>>>>>>>> support and moves the libio support to specific interfaces.
>>>>>>>>
>>>>>>>> OK. When do you plan to add that and to which branches?
>>>>>>>
>>>>>>> I am looking 6-freebsd-12 and master. Thses changes are a major rework and
>>>>>>> not
>>>>>>> suitable for 5.
>>>>>>
>>>>>> I assumed that.
>>>>>>
>>>>>>>
>>>>>>>> I would like to add this
>>>>>>>> patch set to 5 sooner or later because it fixes a potential bug in racoon
>>>>>>>> for
>>>>>>>> that branch too.
>>>>>>>
>>>>>>> Sorry I missed this for 5. I will need to think about this on 5. I do not
>>>>>>> think
>>>>>>> it is a problem but I am not sure about it on 6 and beyond. I would prefer we
>>>>>>> consider a better fix for select that the limited 64 descriptors we currently
>>>>>>> have.
>>>>>>
>>>>>> Maybe we should take a multi step approach:
>>>>>>
>>>>>> Use the workaround like I suggested for 5 and (at the moment) for master. A
>>>>>> similar workaround is already used in quite some other locations in libbsd. So
>>>>>> it's not a new method.
>>>>>>
>>>>>> Start a ticket and a discussion for a better solution.
>>>>>
>>>>> Sounds a good approach. I am fine with the changes because it is what we
>>>>> have to
>>>>> do and this was not meant to halt the progress, rather I wanted to have the
>>>>> discussion and raise the concerns.
>>>>>
>>>>
>>>> OK. Thanks.
>>
>> If you don't add further objections, I'll create a ticket for the patch and
>> apply it in the next days. I'll create an extra ticket and mail thread for
>> applying it to 5 afterwards.
>
> Sorry I have been busy with GSoC and other things and could not get back to
> this. I see you have pushed this change so it is done.
>
> I am going NACK all similar select changes in future. At least you have some
> certianty. This includes updates to ping and ping6 on 6. The patch on the 5
> branch is OK because it fixes a bug however for 6 and libbsd master we need to
> look foward and resolve this issue. Basically for me this type of change is a
> dead end.
Currently most selects in libbsd are fixed with the workarround. But we
still have a select in wpa_supplicant and I'm sure that we get the
problem in the future too. So if you want to continue a discussion for a
clean solution, we could do that for wpa_supplicant. Note that I
currently don't have a fundet project for wpa_supplicant so that would
be hobby time.
Best regards
Christian
>
> Chris
>
--
--------------------------------------------
embedded brains GmbH
Herr Christian MAUDERER
Dornierstr. 4
82178 Puchheim
Germany
email: christian.mauderer at embedded-brains.de
phone: +49-89-18 94 741 - 18
fax: +49-89-18 94 741 - 08
Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/
More information about the devel
mailing list