RFC: Bdbuf transfer error handling

Fri Nov 20 22:27:31 UTC 2009

Sebastian Huber wrote:
> Hi!
> 
> The bdbuf facility provides a cache for block device access in RTEMS
> (http://rtems.com/doxygen/cpukit/html/group__rtems__bdbuf.html).
> 

I really do like the way the state machine is created.

> I want to describe the current transfer error handling in the bdbuf disk device
> block cache and propose some changes.
> 
> Transfer errors can be read errors (buffer is in TRANSFER state), write errors
> (buffer is in TRANSFER state) or a disk delete which occurs between the release
> of a buffer and the activiation of the swapout task (buffer is in MODIFIED or
> SYNC state).
> 
> *******************************************************************************
> 
> Read transfers are initiated due to a read or read ahead request.  The read
> ahead is speculative and has no immediate user.  We have three variants which
> have to deal with a read transfer error:
> 
> R1. Read Request
> 
> Here we have a user and must return a buffer.  We set the error field of the
> buffer according to the transfer status.
> 
> R2. Read Ahead Request with User after the Transfer
> 
> Here we have a user and must return a buffer.
> 
>   a) We set the error field of the buffer according to the transfer status.  No
>      retry happens.  This is the current approach.
> 
>   b) We do a read transfer retry.  If this fails, see R1.  Here we have an
>      inconsistency since it is the duty of the block device driver to execute
>      retries if necessary and useful.  The block device driver already
>      indicated through the error status that it was impossible to read this
>      block.
> 
> R3. Read Ahead Request and No User
> 
> We discard the buffer.  This is the current approach.
> 

As you know I wish to move the read ahead logic out of the cache into the file 
systems. I propose to change the API to have a rtems_chain_control passed in 
for gets and reads and the buffers returned linked the chain. This means a 
file system can determine the number of buffers it wants and the cache will 
attempt to do this. If it cannot it does what it can which could be 0 buffers 
returned because of read errors. It is up to the file system to manage this, 
typically with a EIO. Note, a resource issue in the cache would block the 
requester until it can be completed. The way to return ENXIO when the device 
is not available is something I need to figure out.

This removes the need to return a buffer and therefore the need to hold an 
error state in the cache. What do you think ?

For the sake of those who do not know, the read ahead logic is based on a 
single configuration parameter for the cache. When the file system requests a 
block the cache reads and loads the cache buffers with a number of blocks that 
follow the requested block. This is great for reading a file because it lowers 
the number of calls to the driver to read the blocks and allows hardware to 
burst in large blocks of data. It how-ever suffers from not being able to 
selectively do this. If you tune the read ahead value up to get good file read 
performance you can slow the file system down when it accesses its meta-data 
or directory entries. For example ext2 and RFS use inodes and this is a 
specific data element in a specific block and ahead does nothing more than 
flush maybe useful blocks from the cache as well as slow the inode access time 
down. The MSDOS file system has similar cases.

> *******************************************************************************
> 
> Write transfer errors happen in the context of the swapout task.  We have four
> variants.
> 
> W1. Disk Delete with User
> 
> We set the buffer state to CACHED (this prevents transmission retries) and the
> error field to an error status.  This is a TODO currently.
> 
> W2. Disk Delete without User
> 
> We discard the buffer.  The modified data is lost.  This is a TODO currently.
> 
> W3. Write Error with User
> 
>   a) We set the buffer state to MODIFIED and the error field to an error
>      status.  This is the current approach.  This may lead to an infinite loop
>      of transmission retries.  This buffer cannot be recycled until it was
>      successfully transmitted.
> 
>   b) We set the buffer state to CACHED (this prevents transmission retries) and
>      the error field to an error status.  A read request will return the cached
>      buffer unless someone recycled the buffer in the meantime.  So the
>      behaviour is not deterministic and depends on the future cache usage.  The
>      user cannot determine if the error field indicates a read or write error.
> 
>   c) We set the buffer state to PURGED.  A read request will lead to a read
>      transfer.  This will return the block data on the media or an error
>      status.  It is independent from the future cache usage.  The error field
>      is used only for read transfer errors.  The modified data is lost.
> 
> W4. Write Error without User
> 
>   a) We set the buffer state to MODIFIED and the error field to an error
>      status.  This is the current approach.  This may lead to an infinite loop
>      of transmission retries.  This buffer cannot be recycled until it was
>      successfully transmitted.
> 
>   b) We discard the buffer.  The modified data is lost.
> 
> *******************************************************************************
> 
> I propose that we change the error handling in W3 to strategy c) and in W4 to
> b).  This prevents possible infinite transmission loops.  The error field in
> the buffer is used only for read transfer states.  After a write error we have
> a deterministic behaviour.

I agree the loop needs to be broken. I am confused about the term 'user' and 
have a few questions.

What do you mean by a 'user' ?
Who would be looking at these errors in the cache ?
What is the difference between PURGED and discarded ?

Chris