RFC: Bdbuf transfer error handling
Chris Johns
chrisj at rtems.org
Sun Nov 22 00:21:09 UTC 2009
Thomas Doerfler wrote:
> Chris,
>
> Chris Johns wrote:
>> Sebastian Huber wrote:
>>> R3. Read Ahead Request and No User
>>>
>>> We discard the buffer. This is the current approach.
>>>
>> As you know I wish to move the read ahead logic out of the cache into
>> the file systems. I propose to change the API to have a
>> rtems_chain_control passed in for gets and reads and the buffers
>> returned linked the chain. This means a file system can determine the
>> number of buffers it wants and the cache will attempt to do this. If it
>> cannot it does what it can which could be 0 buffers returned because of
>> read errors. It is up to the file system to manage this, typically with
>> a EIO. Note, a resource issue in the cache would block the requester
>> until it can be completed. The way to return ENXIO when the device is
>> not available is something I need to figure out.
>
> I fear that this would make the filesystems code more complicated,
> because then they are responsible for keeping track which read-ahead
> buffers they have requested.
Not in the file systems we have for RTEMS, including RFS. This is done by the
cache.
>
> Example:
>
> - You open a bg file and read the first 1024 bytes.
> - the filesystem will request to read ahead from bdbuf.
> - therefore, the bdbuf read call will only return to the FS code, when
> ALL sectors are available.
This is no different to read ahead being in the cache.
> - then you process the 1024 bytes and it takes a VERY long time to do so
> (e.g. because you transfer them over a slow network connection or do
> complicated math, or send them to a slow output device or....)
> -> since these read-ahead blocks are requested and occupied from the
> file system, these block are not available for other caching.
>
> In this scenario, a lot of buffer space gets eaten up.
>
This is not what happens currently nor would I propose it into the future. Any
file system code that holds bds when it releases its internal file system
lock is a bug. The MSDOS file system currently does this and it is a bug.
> If you change this scenario slightly, because the read data is processed
> quickly, you get a performance gain since the read-ahead requires less
> transactions between bdbuf and the storage hardware.
Sure this is the purpose of read ahead but it breaks down when the file system
knows it only wants 1 block. For example a small RFS (or ext2fs) partition
with a block size such that you only have 1 bitmap allocator for blocks but
you have a read ahead of 4 blocks. Every time you allocate a block you end up
reading 3 blocks that you may never need at the cost of data already in the
cache you may need. Also a single cache setting for read ahead has to fit all
disk sizes, file systems and needs on a single system. That is difficult to
get right.
> So my point is that it would make sense to have the "read-ahead" buffers
> marked as "reusable".
I am sorry I do not follow. Are you suggesting we implement read locks and
write locks on bds ? This is a complication I have managed to avoid.
My suggestion is simple. Change the get, read and release calls to pass a
chain. On the get and read calls the file system asks for the number of blocks
to be returned. The cache will always attempt to return the first block and if
it cannot returns an error code. This is part of the thread with Sebastian.
The file system needs to manage getting less data than it asked for with a
further request for data. The release calls using a chain allows a single lock
of the cache to handle more than bd. This is a win. Currently the file system
can only request single blocks even if the hardware and cache has read more.
The overhead is repeated cache lock/unlock calls and more no real gain. It is
rear to have 2 users accessing the same device at the cache level. It can
happen with tools like dd and hexdump. The file system must release all bds
back to the cache before returning to the user.
It is similar to the way the read and write calls work. You provide a buffer
of a specific size and the file system fills as much data as it can returning
the amount read or written. The user needs to handle the case where only some
of the data was processed.
A file system with control of read size could map the data read to the amount
requested by the user. For example a user passes in to a 64K buffer to read a
file that is only 1k long. The file system can request just 1k. If the file is
large it could request 64K. Currently the file systems sit is a look doing
this so the time is similar for the user.
> ----------------
>
> I agree that only the file system knows, if and how much read ahead
> really makes sense. OTOT, the switch from sector based bdbuf to block
> (cluster?) based bdbuf also reads bigger chunks, which partially solves
> the read-ahead requriements.
It does but we can go a step further for those file systems that can handle this.
>
> Would it make sense that the filesystems code simply passes an
> additional "hint" parameter to each read/get call and the bdbuf layer is
> again responsible to do the read-ahead (or not) and to keep track of the
> available buffers?
>
We could but it only adds complication. My proposal is to remove the logic
from the cache to lower its complexity. In the file system all it needs to do
is pop the first buffer from the chain and release the remaining buffers on
the chain. This would be the same read ahead in the cache and I think simpler.
> I have already discussed with Sebastian, I think it would make sense to
> define some usage scenarios for the file systems/bdbuf/blockdev area,
In the RFS I already maintain a list of recently used and shared bds, It is a
chain that holds about 5 bd at most. I define a file system transaction as the
time from the file system being locked to being unlocked. The RFS holds
buffers only during the transaction, that is all buffers must be release when
the file system is unlocked.
For meta-data accesses such as bitmap allocators and inodes I only want
single blocks and for and mapped data including the map's data and the number
of blocks read would be capped. The exact number read would depend on the
amount of data requested by the user and size of the map being read.
> so
> we can discuss the pros and cons of the different architectures from a
> common basis.
I think you will need to give me an example.
> And before you get the wrong impression: I really appreciate the great
> improvement you are doing in that area,
Yeah I agree and please do not only thank me. Sebastian is also doing a great
job. It has been so good having a peer review the code and improve it.
> it is just that we have
> different use cases in mind when doing certain design decisions and
> therefore we tend to different paths.
What file system are you using ?
How does this all effect the USB disk access ?
>
> With kind regards,
>
> Thomas.
>
>
More information about the users
mailing list