rtems_semaphore_obtain
Chris Xenophontos
cxenophontos at hammers.com
Thu Mar 29 12:48:46 UTC 2007
Joel, et al,
We rebuilt the application with RTEMS 4.6.6, and will let it run for 24
hours, under the same conditions where the problem has been observed.
I'll post the results,
thanks
Chris
-----Original Message-----
From: Joel Sherrill [mailto:joel.sherrill at oarcorp.com]
Sent: Tuesday, March 27, 2007 4:03 PM
To: Eric Norum
Cc: Chris Xenophontos; 'Tom Phillips'; rtems-users at rtems.org
Subject: Re: rtems_semaphore_obtain
This was a long thread and got disconnected from the original
problem. This response is more directly to the original issue.
Comments below.
From the first email:
> We have a situation, running RTEMS 4.6.0, on a Coldfire 5208.
There were numerous bug fixes between 4.6.0 and the end of the 4.6 branch.
It is quite possible that one of them could be the culprit and any
analysis of
code as suggested below would be useless. The memory barrier patch comes
to mind.
> One of the threads (tasks) in our application pends with a one-second
> timeout on a semaphore released by an ISR.
>
> The interrupt source that triggers the ISR (for this test case) runs every
3
> seconds (~3.011). This re-creates the problem more frequently.
Second, it is possible that the clock tick ISR is timing out before the
semaphore
is released from the ISR. After 10 hours, I can see the clock tick ISR
and the HW ISR
getting sometimes closely aligned and you get:
IDLE Task running
clock tick ISR -- semaphore timeout
# possibly a task other than the listening task runs
HW ISR - releases semaphore (a.k.a. signals condition)
### TASK IS NOT WAITING ANYMORE!!!
### SO THERE IS NOTHING TO DO!!!
### You are signaling a condition but no one is listening.
Task returns from sem obtain with timeout
I think Eric Norum's suggestion to use a message queue is a good
alternative. Using
a timeout like 4 or 5 seconds would also probably solve it.
But as Eric also pointed out there is skew between the time the HW ISR
occurs and
you process it (tasking delays). Your timeout slops on the end of that
and it eventually
accumulates where the clock and HW interrupts align.
The discussion that diverged from this is different and requires
different commentary.
Ironically I was thinking of the same think last week for different
reasons. Splitting
the semaphore manager into binary and counting to reduce minimum
footprint. :)
--joel
Eric Norum wrote:
> On Mar 20, 2007, at 8:34 AM, Chris Xenophontos wrote:
>
>
>> All, thanks for the suggestions-
>>
>> The interrupt is a continuous interrupt source, does not need re-
>> enabling.
>> It is ack'ed in the ISR upon entry, and the H/W regs are being
>> loaded into
>> global variables.
>>
>> Almost appears to be a point at which, during execution of the
>> task, that
>> release of a semaphore does not register. Again, this is once in
>> about 4-10
>> hours, with interrupt being generated every 3.001 seconds.
>>
>> I am going to start with a simple fix first, i.e, using a counting
>> semaphore, as L. Pollak indicated -
>>
>
> Be very careful with this since you're using shared variables to
> transfer data between the ISR and the task........the interrupt
> handler might overwrite the data before the task can get at all or
> part of it. The message queue approach avoids this problem.
>
> I am concerned that you may have uncovered a more serious problem.
> I don't have time to go through the semaphore code with a fine
> toothed comb and check that there are no possible race conditions --
> maybe Joel can do this when he gets back from Europe. Or perhaps
> Daron can do this for us???
>
>
>> Then we'll look into other more involved solutions
>>
>> thanks
>> Chris Xenophontos
>>
>>
>>
>> -----Original Message-----
>> From: Eric Norum [mailto:norume at aps.anl.gov]
>> Sent: Monday, March 19, 2007 12:45 PM
>> To: Chris Xenophontos
>> Cc: rtems-users at rtems.org; 'Tom Phillips'
>> Subject: Re: rtems_semaphore_obtain
>>
>> Does your hardware interrupt continuously or does the awakened task
>> have to do something to reenable the hardware?
>>
>> Although you don't say it, I presume that you are now using some sort
>> of global variable to communicate the three values read from hardware
>> from the ISR to the task.
>>
>> How about changing the ISR from using a semaphore to using a message
>> queue to send the three values to the task? This is more robust
>> since no data are lost should the task not be able to keep up for a
>> brief period because of higher priority tasks in the way.
>>
>>
>> On Mar 19, 2007, at 11:31 AM, Chris Xenophontos wrote:
>>
>>
>>> Correction,
>>>
>>> example is incorrect, myIsrSemId is passed as an rtems_id, not as a
>>> pointer,
>>>
>>>
>>> thanks Eric,
>>> cx
>>>
>>>
>>> -----Original Message-----
>>> From: Eric Norum [mailto:norume at aps.anl.gov]
>>> Sent: Monday, March 19, 2007 11:47 AM
>>> To: Chris Xenophontos
>>> Cc: rtems-users at rtems.org; 'Tom Phillips'
>>> Subject: Re: rtems_semaphore_obtain
>>>
>>> If that's really an accurate description of the code I'm amazed that
>>> it works at all.
>>> rtems_semaphore_obtain and rtems_semaphore_release take an rtems_id
>>> as their first argument. They do not take a pointer to an rtems_id
>>> as you have shown.
>>>
>>> On Mar 19, 2007, at 9:39 AM, Chris Xenophontos wrote:
>>>
>>>
>>>> Hello all,
>>>>
>>>> We have a situation, running RTEMS 4.6.0, on a Coldfire 5208.
>>>> One of the threads (tasks) in our application pends with a one-
>>>> second
>>>> timeout on a semaphore released by an ISR.
>>>>
>>>> The interrupt source that triggers the ISR (for this test case)
>>>> runs every 3
>>>> seconds (~3.011). This re-creates the problem more frequently.
>>>>
>>>> Typically, the task will run, timeout once per second, unless the
>>>> ISR is
>>>> triggered, in which case it will recognize it and process
>>>> accordingly. The
>>>> task processes critical values read from hardware that are latched
>>>> by the
>>>> ISR. 99.99% of the time, it works as expected.
>>>>
>>>> As the ISR trigger "walks" close to the time of task timeout, we
>>>> see the
>>>> back-to-back executions, as expected.
>>>>
>>>> However, every 10 hours or so, the task will not respond to the
>>>> semaphore
>>>> released from the ISR, even though debugging clearly shows the ISR
>>>> responded
>>>> to the HW interrupt and latched the hardware values.
>>>>
>>>> The effect is a dropped interrupt - the rtems_semaphore_obtain call
>>>> does not
>>>> return with an RTEMS_SUCCESSFUL status -- it returns a TIMEOUT. No
>>>> other
>>>> RTEMS error status are returned either (we check for these well).
>>>>
>>>> The task, ISR, and semaphore_create function that we're using are
>>>> listed
>>>> below. Any help appreciated!!
>>>>
>>>> Thanks
>>>> Chris Xenophontos
>>>>
>>>> ///////////// mytask//////////////////
>>>> mytask()
>>>> {
>>>> while( 1 )
>>>> {
>>>> rtems_status = rtems_semaphore_obtain( &myIsrSemId,
>>>> RTEMS_WAIT,
>>>> 100 ); // 100 ticks =
>>>> 1 second
>>>>
>>>> if(( rtems_status != RTEMS_SUCCESSFUL ) &&
>>>> ( rtems_ststaus != RTEMS_TIMEOUT ))
>>>> {
>>>> // post error status ( we NEVER see an error here )
>>>> }
>>>>
>>>> if( rtems_status == RTEMS_SUCCESSFUL )
>>>> {
>>>> // do specific processing based on semaphore released by ISR
>>>> ( not always seen, even though the ISR ran)
>>>> }
>>>> else
>>>> {
>>>> // specific processing based on timeout waiting for
>>>> semaphore
>>>> }
>>>> }
>>>> }
>>>> //////////////// end mytask///////////
>>>>
>>>> the ISR is as follows
>>>> ///////////////////myIsr////////////////
>>>> myIsr()
>>>> {
>>>> ( code to ack the Colfire interrupt )...
>>>>
>>>> code to read 3 hardware registers....
>>>>
>>>> rtems_semaphore_release( &myIsrSemId );
>>>> }
>>>> ///////////////myIsr///////////////////
>>>>
>>>> the semaphore is created as follows, and is always created
>>>> succesfully:
>>>>
>>>> status = rtems_semaphore_create( "MISR", 0,
>>>> ( RTEMS_FIFO | RTEMS_NO_INHERIT_PRIORITY |
>>>> RTEMS_SIMPLE_BINARY_SEMAPHORE |
>>>> RTEMS_NO_PRIORITY_CEILING | RTEMS_LOCAL ),
>>>> RTEMS_NO_PRIORITY, &myIsrSemId )
>>>>
>>>>
>>>> mytask is created with the following attributes:
>>>> RTEMS_PREEMPT | RTEMS_NO_ASR | RTEMS_NO_TIMESLICE |
>>>> RTEMS_INTERRUPT_LEVEL(0),
>>>> RTEMS_FLOATING_POINT | RTEMS_LOCAL,
>>>>
>>>> thanks,cx
>>>>
>>>> _______________________________________________
>>>> rtems-users mailing list
>>>> rtems-users at rtems.com
>>>> http://rtems.rtems.org/mailman/listinfo/rtems-users
>>>>
>>> --
>>>
>>> Eric Norum <norume at aps.anl.gov>
>>> Advanced Photon Source
>>> Argonne National Laboratory
>>> (630) 252-4793
>>>
>> --
>>
>> Eric Norum <norume at aps.anl.gov>
>> Advanced Photon Source
>> Argonne National Laboratory
>> (630) 252-4793
>>
>
>
More information about the users
mailing list