RFC: Bdbuf transfer error handling

Sat Nov 21 10:09:28 UTC 2009

Chris Johns wrote:
> Sebastian Huber wrote:
>> Write transfer errors happen in the context of the swapout task.  We 
>> have four
>> variants.
>>
>> W1. Disk Delete with User
>>
>> We set the buffer state to CACHED (this prevents transmission 
>> retries) and the
>> error field to an error status.  This is a TODO currently.
>>
>> W2. Disk Delete without User
>>
>> We discard the buffer.  The modified data is lost.  This is a TODO 
>> currently.
>>
>> W3. Write Error with User
>>
>>   a) We set the buffer state to MODIFIED and the error field to an error
>>      status.  This is the current approach.  This may lead to an 
>> infinite loop
>>      of transmission retries.  This buffer cannot be recycled until 
>> it was
>>      successfully transmitted.
>>
>>   b) We set the buffer state to CACHED (this prevents transmission 
>> retries) and
>>      the error field to an error status.  A read request will return 
>> the cached
>>      buffer unless someone recycled the buffer in the meantime.  So the
>>      behaviour is not deterministic and depends on the future cache 
>> usage.  The
>>      user cannot determine if the error field indicates a read or 
>> write error.
>>
>>   c) We set the buffer state to PURGED.  A read request will lead to 
>> a read
>>      transfer.  This will return the block data on the media or an error
>>      status.  It is independent from the future cache usage.  The 
>> error field
>>      is used only for read transfer errors.  The modified data is lost.
>>
>> W4. Write Error without User
>>
>>   a) We set the buffer state to MODIFIED and the error field to an error
>>      status.  This is the current approach.  This may lead to an 
>> infinite loop
>>      of transmission retries.  This buffer cannot be recycled until 
>> it was
>>      successfully transmitted.
>>
>>   b) We discard the buffer.  The modified data is lost.
>>
>> ******************************************************************************* 
>>
>>
>> I propose that we change the error handling in W3 to strategy c) and 
>> in W4 to
>> b).  This prevents possible infinite transmission loops.  The error 
>> field in
>> the buffer is used only for read transfer states.  After a write 
>> error we have
>> a deterministic behaviour.
>
> I agree the loop needs to be broken. I am confused about the term 
> 'user' and have a few questions.
>
> What do you mean by a 'user' ?

A user is someone who waits for this buffer (we have bd->waiters > 0).

> Who would be looking at these errors in the cache ?

Since the write transfer is decoupled from the buffer release via the 
swapout task we have no error indication at release time. Maybe we can 
introduce a statistic which registers error conditions.

> What is the difference between PURGED and discarded ?

Discard means that the buffer will be removed from the AVL tree and 
prepended to the LRU chain. This is only possible when nobody waits for it.

The PURGED state indicates that this buffer is in the AVL tree. It is 
not in the LRU chain, since it has a user. This user may be a sync 
waiter or a read/get waiter. A read request will trigger a read transfer.

>
> Chris
>

-- 
Sebastian Huber, Embedded Brains GmbH

Address : Obere Lagerstr. 30, D-82178 Puchheim, Germany
Phone   : +49 89 18 90 80 79-6
Fax     : +49 89 18 90 80 79-9
E-Mail  : sebastian.huber at embedded-brains.de
PGP     : Public key available on request

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.