Partition problem

Leon Pollak leonp at plris.com
Tue Oct 23 14:50:41 UTC 2007


On Tuesday 23 October 2007 16:06, Joel Sherrill wrote:
> Leon Pollak wrote:
> > Thank you, Tim.
> > The issue you write is clear for me.
> > But reading the docs, I was sure that partition is thread safe. Also,
> > looking into the code shows a lot of enable/disable thread dispatch.
> > Is not this enough?
>
> partitions are supposed to be thread safe.  They are based upon
> supercore chains and the partition get and return use chain
> operations which disable interrupts during get/append operations.
>
> It should be very safe.
I am glad I was on the right way.


> Out of curiosity, if you do an objdump on your executable,
> do you see anything being moved (by optimization)
> outside interrupt or dispatching critical sections?
I work with all debug (application and RTEMS), so nothing moved out...:-)
I feel that I am on the way to find something (although not yet) thanks to log 
technique from Allen Kitchen. And this seems to be my problem.
Sorry to disturb the people, but this is some Murphy law - till I do not write 
to the list, I do not see any way to solve the problem. :-((

Thanks a lot again for the help!

Leon
> --joel
>
> > Thanks again.
> >
> > On Tuesday 23 October 2007 15:12, Tim Cussins wrote:
> >>> I do not believe that someone will look into the following description
> >>> of theproblem, but may be this is in the area of known problems in 4.7
> >>> and I simply waste my time...:-)
> >>>
> >>>
> >>>
> >>>
> >>> I have two tasks working on the same partition - taking buffers and
> >>> returning
> >>> back. Buffers are all of the same size.
> >>>
> >>> When the tasks are working sequentially, everything is OK.
> >>> But when I make them to work in parallel (one may interrupt the other),
> >>> very
> >>> quickly I receive the exception on rtems_partition_get_buffer, which
> >>> shows me
> >>> (after diving into the rtems code) that the element
> >>> the_partition->Memory.first has garbage value (all the rest of the
> >>> structure
> >>> looks normal).
> >>
> >> Sounds to me like you need to protect your accesses to the partition
> >> with a mutex...
> >>
> >> You may have run across one of the fundamental issues with
> >> multi-threading - two threads that need to share a resource (ie your
> >> "the_partition" structure) - if one thread is in the middle of modifying
> >> the structure and is interrupted, the stucture could easily be in an
> >> invalid (garbage) state when the other thread looks at it.
> >>
> >> You'll probably need to add code to each thread that acquires a mutex
> >> (shared between the threads, ie "the_partition_mutex") before operating
> >> on "the_partition" using the rtems_partition_get_buffer(). Of course it
> >> will have to release the mutex after it is done.
> >>
> >> In this way access to the_partition is serialised (only one thread can
> >> be modifying it at any time). Have a look on google/wikipedia for info
> >> on mutexes and critical sections.
> >>
> >>> I tried to catch this with my bdm debugger, but after one full day I
> >>> gave up -
> >>> too complex to catch this at the moment of corruption as this value
> >>> changes
> >>> all the time.
> >>
> >> yeah, good luck trying to catch that one! :P



More information about the users mailing list