rtems bdbuf question (again)

Till Straumann strauman at slac.stanford.edu
Wed Aug 1 22:12:27 UTC 2007


Just for the fun of it I was again thinking
about replacing the protection scheme of
libblock / bdbuf (which currently employs
task-preemption disabling) by a mutex approach.

(See this old thread started by this message:
http://www.rtems.org/ml/rtems-users/2002/october/msg00054.html
)

However, I got stopped in the tracks early by
what I perceive as a bug:

Assume two threads concurrently read the same block
(rtems_bdbuf_read()) which shall not be cached yet.
We also assume that enough free blocks are available.

Hence, the first thread runs w/o preemption through
  1 obtaining a buffer (find_or_assign_buffer())
  2 setting up a read request to disk
  3 initializing bd_buf's transfer semaphore
until
  4 blocking on the transfer semaphore

at this point, the first thread is suspended blocking
for the semaphore so that the second thread can
enter the critical section. The second thread enters
find_or_assign_buffer(), finds the buffer already allocated
by thread 1 and executes the code fragment

while ( bd_buf->in_progress != 0 )
{
    /* Potential for race condition if IRQ happens here */
    rtems_interrupt_disable(level)
    _CORE_mutex_Seize(&bd_buf->transfer_sema, ...)
}

If an IRQ signalling completion of the read transfer
happens at the marked point above ('in_progress' has
been set to 1 by thread #1 when scheduling the disk read transfer
[step 2 above]) then the ISR releases and flushes the transfer_sema
(bdbuf_read_transfer_done()) which eventually results in thread #1
resuming execution.
Thread #2, however, which is not yet blocked on the transfer_sema
will subsequently do so and therefore hang.

IMO the loop should be modified to

rtems_interrupt_disable()
while (bd_buf->in_progress)
{
    _CORE_mutex_Seize()
    rtems_interrupt_disable()
}
rtems_interrupt_enable()

Accordant modifications are needed at similar locations in the
code.

Besides that I'd like to note:

1) The bug (rtems_bdbuf_get) mentioned in

http://www.rtems.org/ml/rtems-users/2002/october/msg00041.html

    is still in the current code.

2) I uphold a claim made on that thread that the 'bufget_sema'
    should be released prior to the 'goto again' statement.
    OTOH it is not obvious that this statement is really needed
    since one would think that once the bufget_sema is obtained
    that there must be a buffer, either on the free or lru list and
    hence the test if ( bd_buf == NULL ) should always fail.
   
-- Till





More information about the users mailing list