libblock and FAT

Sun Aug 25 23:43:57 UTC 2013

I've been writing an SDHCI/MMC driver for use with the Xilinx ZYNQ.  Specifically, it is for use with SD cards and I am using 4 GB cards from Samsung for debugging and testing.  These advertise a read rate ability of up to 24 MB/s.  The average rate that I'm seeing is no where near that and I've traced the cause down to several points:

1) The ZYNQ's SD/SDIO host controller takes order 650 uS to respond with DATA_AVAILABLE after being sent a read command.  The cause of this may, of course, be the SD card and not the host controller.  However, the same card can be read and written by a Mac much quicker than I can achieve with the ZYNQ and RTEMS.
2) I gather that the block size given to rtems_disk_create_phys() should be 512, which is the same as that of the card.  This appears to cause all read and write ioctl sg requests to have a length of 512, thus preventing a device driver from being able to take advantage of a device's ability to handle larger amounts of contiguous data in one go.  In the SD card's case, I'm referring to the read- and write- multiple block command.
3) A read request generated with the shell command 'cp /mnt/sd/<file> ./t.tmp', with <file> being less than 512 bytes long, generates at least 3 reads of the FAT block and the directory (64 blocks in my case).  I have seen the number of these reads as high as 7, but generally it seems to be 3 or 4.

Since I haven't found much documentation describing how libblock and libfs should be used, I've been using nvdisk and others in the libblock directory as examples, along with libchip/i2c/spi-sd-card.c.  If I've missed some reading material, could someone please point it out?

It looks to me to be a difficult job to figure out how libblock and libfs/FAT work.  Could someone please give me some guidance on how to approach this?  

I would like to understand why the FAT and directory need to be read multiple times, and why, if it does, the buffering feature of libblock doesn't seem to be taken advantage of.  Is there perhaps a cache buffer size that's not configured properly somewhere?  

Since the directory is made up of 64 contiguous blocks, it would be more efficient to read it with one read command.  What can be done to get that to happen?  

A number of block device drivers divide the device's block size into the sg length (as length / blocksize rather than (length + blocksize - 1) / blocksize, surprisingly), suggesting that the result could be larger than one.  Why perform this calculation if that's never the case?

  Thanks,
             Ric