PCP implementation

Sun Apr 9 09:49:14 UTC 2006

Hello Till,

On Saturday 08 April 2006 05:03, Till Straumann wrote:
> Pavel Pisa wrote:
> >Hello All,
> >
> >On Saturday 08 April 2006 01:36, Martin Molnar wrote:
>
> -- snip --
>
> >>What about implementing the changing of task`s priority this way:
> >>-enforce the mutex release rule that PCP mutexes are released in the
> >>opposite order  of acquisition. I think , it is not a painful
> >> restriction.
>
> Sorry for not reading entirely through your suggenstions but IMO we have
> to stop right here. I do not agree with the premise. I don't think that
> this restriction is acceptable.

Please, try to be patient with me, I have started my description from
this Martin premise to describe my idea. But in the end, I am finishing
with suggestion for code which doesnot require order preservation.
I have already thought about this years ago, when I first noticed
actual RTEMS solution. But disability to suppress complexity
of required algorithms has hold me back to speak up.

I agree, that we cannot enforce something which would lead to
misbehavior of the code which follows Posix semantics and works
correctly on other systems.

But I think, that algorithms offered in my previous e-mail
should not impose any restrictions.

I have even other ideas, which could help to speed up priority
adjustments, but they have problem that they need even more data per
each mutex. My first iteration needs
  prev_acquired	- pointer
  prev_prio or effective_prio_ceiling - priority type (byte for RTEMS)
But if prev_acquired is replaced by peers_acquired double linked
list, some operations can be speedup. There is still some overhead
if you want to fully follow lowering of priority if out of order
release is proceed. This could be enhanced a little, if the second
per mutex priority field is added. But worst scenario complexity
is still O(n) of acquired nested mutexes. If we speak about n=1,2,3,
then it is OK. If we speak about 20 or 1000, than it could be problem.

The ultimate solution is to maintain per mutex waiters in the fully
priority sorted queue. Then hold per thread acquired mutexes sorted according
to ceiling/effective_ceiling priorities in the tree structure.
Than we can reduce complexity to O(log n) for waiter enqueue and O(log n)
for effective ceiling priority adjustments. But for typical one or two
nested mutexes it would have considerable overhead. We can try to write
code and then test it. It could be even made configurable, because I expect,
that it would make things worse for tiny applications, it could be profitable
for huge complex ones.

> Think about a scenario where you have a 
> small critical section protected by a 'fine-grained' mutex. Part of the
> critical code section is acquisition of a 'coarse' mutex:
>
>    lock(fine_grained)
>       do_critical_work()
>       lock(coarse)
>    unlock(fine_grained)
>
>       do_coarsely_protected_work()
>
>
>    lock(fine_grained)
>        unlock(coarse)
>        do_critical_cleanup()
>    unlock(fine_grained)
>
> Easy to imagine such a situation -> stack approach is not acceptable.

Yes, this is possible use of mutexes, on the other hand I prefer
to solve something like this through reference counters

    lock(cointainer)
      locate object
      atomic_inc(object->refcnt)
    ulock(cointainer)

    examine and prepare object operation
    use lock/unlock(object) for minimal required time

    while (1)
      oldrefcnt=object->refcnt
      if(oldrefcnt>1)
        if(CAS(&object->refcnt,oldrefcnt,oldrefcnt-1))
          break;
      else
        destroy object
        break

But this can be unacceptable for some of cases you mention
and is unacceptable for existing ported code.

It seems, that I am too long again, sorry.

Best wishes

             Pavel Pisa

> >>- for each task, maintain the "resources held set" implemented as stack.
> >>  Every stack element(record) consists of two fields: the mutex id (to
> >>control that the mutex relase rule is kept) and task`s priority after
> >>releasing that mutex.
> >
> >The stack of fixed size is bad programmers practice.
> >OK, you have task stack of fixed sizes, but it already stroked
> >to Joel and others by mysterious PC port fails.
> >
> >>For better explation, consider the example:
> >>- task`s normal priority is 9.
> >>1. Task acquires mutex 4 ( 4 is the ceiling of mutex). Then,put on the
> >>stack element(mutex id= 4   priority= current priority=9 ) and change
> >>task`s current priority to 4.
> >>2.Task acquires mutex 8. Then put on the stack element(mutex id= 8
> >>priority=current priority= 4 ). Task`s current priority is not changed
> >>because it is higher than 8.
> >>
> >>Example of the stack:
> >>mutex id: 4 8 6 2 1
> >>priority: 9 4 4 4 2
> >
> >OK, I am not sure, if you counted for what has to happen
> >when task with priority 3 wants to overtake mutex 8,
> >your queues has to be adjusted to
> >
> >mutex id: 4 8 6 2 1
> >priority: 9 4 3 3 3
> >
> >no change of actual task's priority, if task with priority
> >1 arrives now and asks for mutex 6, situation changes
> >
> >mutex id: 4 8 6 2 1
> >priority: 9 4 3 1 1
> >
> >and actual task priority has to be adjusted.
> >
> >To build LIFO list, I would suggest to add only
> >one pointer to latest acquired mutex to the task structure
> >and pointer and priority fields to each mutex.
> >The last piece is identifier of actual holding thread
> >(but it is there already).
> >Mutex (by the definition) guarantees, that only one thread
> >of execution can hold it. This means, that no other thread
> >can use mutex private fields if they are assigned to actual
> >holding thread.
> >
> >the mutex lock code
> >
> >  if mutex is available
> >    mutex->prev_acquired=task->last_mutex
> >    task->last_mutex=mutex
> >    mutex->prev_prio=task->prio
> >    mutex->holder=task->id
> >    adjust priority for ceiling
> >  else
> >    if mutex->prev_prio<thread->prio
> >      resolve mutex->holder
> >      m=holder->last_mutex
> >      while m!=mutex
> >        if mutex->prev_prio<thread->prio
> >          mutex->prev_prio=thread->prio
> >        m=mutex->prev_acquired
> >      if holder->prio < thread->prio
> >        escalate holder->prio to thread->prio
> >
> >>It`s obvious how to release mutexes and change task`s priority.
> >
> >Simple and O(1), but I am not sure, if above described complexity
> >worth to be implemented. The
> >
> >The above algorithm can work even for recursive mutexes,
> >the whole acquisition of mutex is skipped if holder is actual
> >task and only recursion count is increased. This is as it is
> >implemented in RTEMS now.
> >
> >The code can be speed up little, if priority stack is shifted
> >by one and first stage is stored directly in the TCB.
> >This way new attempting task is informed more quickly
> >that actual mutex priority is already escalated and
> >gets the value of that escalation => it does not to need
> >to go through list to find that fact in the mutex
> >pointing exactly to the mutex in the question.
> >
> >This code can even work for different order of release
> >than acquisition:
> >
> >  if task->last_mutex == mutex
> >    task->last_mutex = mutex->prev_acquired
> >    change task->prio to mutex->prev_prio
> >       (little more complex with shifted priority stack)
> >  else
> >    go from task->last_mutex and find neiborhoods
> >    of mutex and adjust list accordinly
> >    the priority adjustment is problem in this situation,
> >    with shifted stack it is feasible and simpler
> >
> >If you think, that it worths the complexity, I can try to help
> >to implement it. But I am not sure, if this is not something,
> >which should be configurable. Good code tries to hold
> >mutexes for minimal necessary time, does not call more functions
> >with mutex hold etc etc. In other case, there are problems with
> >missed deadlock possibilities which creep into code as result
> >of other pieces adjustments ... . So I think, that for carefully
> >written code and applications actual solution could be
> >even more profitable than fully correct one.
> >Other possibility is to select this option on per mutex basis.
> >
> >Best wishes
> >
> >           Pavel Pisa