AW: SMP for pc686 BSP

Fri May 17 02:26:07 UTC 2019

On 17/5/19 2:11 am, Jan.Sommer at dlr.de wrote:
>> -----Ursprüngliche Nachricht-----
>> Von: Joel Sherrill [mailto:joel at rtems.org]
>> Gesendet: Donnerstag, 16. Mai 2019 16:33
>> An: Sommer, Jan
>> Cc: rtems-devel at rtems.org
>> Betreff: Re: SMP for pc686 BSP
>>
>>
>>
>> On Thu, May 16, 2019, 9:01 AM <Jan.Sommer at dlr.de> wrote:
>>
>>
>> 	Hello,
>>
>> 	I have been working recently to enable SMP for the pc686-BSP and
>> managed to get a basic setup running with 2 cores which currently passes
>> 44/57 smptests.
>> 	Next step is to make it work for arbitrary numbers of cores. The goal
>> is to push the patches upstream. That's why I would like to ask the more
>> experienced x86-gurus what would be the preferred direction for some
>> questions.
>>
>> 	1. If I understand it correctly the start16.S is only compiled for SMP
>> configurations since bin2boot has been removed.
>> 	Would you be ok, if I create a "startAP.S" from it, i.e. remove the
>> ifdef SMP_SECONDARY_CORE and the A20 gate enabling (is done by the
>> BSP) to have a minimal ASM-file to bring up an AP?
>>
>> 	2. According to the Intel MultiProcessor Specification the LAPIC IDs
>> do not have to be consecutive
>> 	Thus, in smp-imps.h the "imps_cpu_apic_map" is created. It is
>> populated at the beginning, but all other code uses the LAPIC ID for
>> processor identification.
>> 	Are non-consecutive processor numbers only a theoretical issue, or
>> do they appear in real life as well?
>> 	Should I change the code to use the map or keep using the the LAPIC
>> ID?
>>
>> 	3. In the current setup the AP has its own global descriptor table
>> located in start16.S, but because TLS uses the GS segment and writes to it
>> during context switches, it will create problems with multiple APs.
>> 	One idea would be to create a dedicated GDT for each AP and copy
>> the GDT of the BSP before starting each AP. For that I could
>> 	a) define an array of GDTs, but the problem is the dimension as
>> CONFIGURE_MAXIUM_PROCESSORS is not known when compiling the
>> RTEMS libraries.
>> 	b) add it in the Per_CPU_Control structure to the cpu_per_cpu field.
>> 	c) let the BSP allocate memory for each AP's GDT using malloc
>> 	or d) I could update the existing GDT in ldsegs.S to have additional
>> GS segments for each processor core. Question is again how many sections I
>> should put there (255?) and if that would create problems for the user
>> segments (haven't checked yet where they are used).
>>
>> 	Any other ideas, remarks or hints for possible pitfalls are also
>> greatly appreciated.
>>
>>
>>
>> I am happy with anything you do to clean up this BSP to clean it up in single
>> or multiprocessor mode.
>>
>> One ticket you need to address is that the context switch is missing the
>> interlock logic that avoids issues when a thread is migrated. Without the
>> sync point, you can be restoring a thread on one cpu on on core before it is
>> completely saved. I don't recall the ticket number.
>>
> 
> Is it this ticket?: https://devel.rtems.org/ticket/2183
> I looked at the changes from Sebastian and updated the cpu_asm.S to be more close to the one of the ARM architecture.
> I haven't been able to test that in detail yet, because some troubles in getting the APs to properly boot.
> 
> For me the biggest open question atm is 3.) regarding what to do with the GDTs. At the moment I lean towards option 3b and adding the information to the per_CPU structure, because it seems pretty easy.
> In common setups this would increase the structure by 48 bytes additional space (8 bytes for NULL, text, data, gs and 2 user sections).
> Would that be a problem?

Does cpukit/include/rtems/score/percpudata.h help?

I am not sure how this is put together but the comments at the top of the file
seem to indicate this exists for this type of purpose. It looks pretty neat.

Chris