rtems bdbuf question (again)
Till Straumann
strauman at slac.stanford.edu
Wed Aug 1 22:12:27 UTC 2007
Just for the fun of it I was again thinking
about replacing the protection scheme of
libblock / bdbuf (which currently employs
task-preemption disabling) by a mutex approach.
(See this old thread started by this message:
http://www.rtems.org/ml/rtems-users/2002/october/msg00054.html
)
However, I got stopped in the tracks early by
what I perceive as a bug:
Assume two threads concurrently read the same block
(rtems_bdbuf_read()) which shall not be cached yet.
We also assume that enough free blocks are available.
Hence, the first thread runs w/o preemption through
1 obtaining a buffer (find_or_assign_buffer())
2 setting up a read request to disk
3 initializing bd_buf's transfer semaphore
until
4 blocking on the transfer semaphore
at this point, the first thread is suspended blocking
for the semaphore so that the second thread can
enter the critical section. The second thread enters
find_or_assign_buffer(), finds the buffer already allocated
by thread 1 and executes the code fragment
while ( bd_buf->in_progress != 0 )
{
/* Potential for race condition if IRQ happens here */
rtems_interrupt_disable(level)
_CORE_mutex_Seize(&bd_buf->transfer_sema, ...)
}
If an IRQ signalling completion of the read transfer
happens at the marked point above ('in_progress' has
been set to 1 by thread #1 when scheduling the disk read transfer
[step 2 above]) then the ISR releases and flushes the transfer_sema
(bdbuf_read_transfer_done()) which eventually results in thread #1
resuming execution.
Thread #2, however, which is not yet blocked on the transfer_sema
will subsequently do so and therefore hang.
IMO the loop should be modified to
rtems_interrupt_disable()
while (bd_buf->in_progress)
{
_CORE_mutex_Seize()
rtems_interrupt_disable()
}
rtems_interrupt_enable()
Accordant modifications are needed at similar locations in the
code.
Besides that I'd like to note:
1) The bug (rtems_bdbuf_get) mentioned in
http://www.rtems.org/ml/rtems-users/2002/october/msg00041.html
is still in the current code.
2) I uphold a claim made on that thread that the 'bufget_sema'
should be released prior to the 'goto again' statement.
OTOH it is not obvious that this statement is really needed
since one would think that once the bufget_sema is obtained
that there must be a buffer, either on the free or lru list and
hence the test if ( bd_buf == NULL ) should always fail.
-- Till
More information about the users
mailing list