RFC: Bdbuf transfer error handling

Fri Nov 20 13:12:55 UTC 2009

Hi!

The bdbuf facility provides a cache for block device access in RTEMS
(http://rtems.com/doxygen/cpukit/html/group__rtems__bdbuf.html).

I want to describe the current transfer error handling in the bdbuf disk device
block cache and propose some changes.

Transfer errors can be read errors (buffer is in TRANSFER state), write errors
(buffer is in TRANSFER state) or a disk delete which occurs between the release
of a buffer and the activiation of the swapout task (buffer is in MODIFIED or
SYNC state).

*******************************************************************************

Read transfers are initiated due to a read or read ahead request.  The read
ahead is speculative and has no immediate user.  We have three variants which
have to deal with a read transfer error:

R1. Read Request

Here we have a user and must return a buffer.  We set the error field of the
buffer according to the transfer status.

R2. Read Ahead Request with User after the Transfer

Here we have a user and must return a buffer.

  a) We set the error field of the buffer according to the transfer status.  No
     retry happens.  This is the current approach.

  b) We do a read transfer retry.  If this fails, see R1.  Here we have an
     inconsistency since it is the duty of the block device driver to execute
     retries if necessary and useful.  The block device driver already
     indicated through the error status that it was impossible to read this
     block.

R3. Read Ahead Request and No User

We discard the buffer.  This is the current approach.

*******************************************************************************

Write transfer errors happen in the context of the swapout task.  We have four
variants.

W1. Disk Delete with User

We set the buffer state to CACHED (this prevents transmission retries) and the
error field to an error status.  This is a TODO currently.

W2. Disk Delete without User

We discard the buffer.  The modified data is lost.  This is a TODO currently.

W3. Write Error with User

  a) We set the buffer state to MODIFIED and the error field to an error
     status.  This is the current approach.  This may lead to an infinite loop
     of transmission retries.  This buffer cannot be recycled until it was
     successfully transmitted.

  b) We set the buffer state to CACHED (this prevents transmission retries) and
     the error field to an error status.  A read request will return the cached
     buffer unless someone recycled the buffer in the meantime.  So the
     behaviour is not deterministic and depends on the future cache usage.  The
     user cannot determine if the error field indicates a read or write error.

  c) We set the buffer state to PURGED.  A read request will lead to a read
     transfer.  This will return the block data on the media or an error
     status.  It is independent from the future cache usage.  The error field
     is used only for read transfer errors.  The modified data is lost.

W4. Write Error without User

  a) We set the buffer state to MODIFIED and the error field to an error
     status.  This is the current approach.  This may lead to an infinite loop
     of transmission retries.  This buffer cannot be recycled until it was
     successfully transmitted.

  b) We discard the buffer.  The modified data is lost.

*******************************************************************************

I propose that we change the error handling in W3 to strategy c) and in W4 to
b).  This prevents possible infinite transmission loops.  The error field in
the buffer is used only for read transfer states.  After a write error we have
a deterministic behaviour.

Have a nice day!

-- 
Sebastian Huber, embedded brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax     : +49 89 18 90 80 79-9
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.