NFS 1.4 released (was Re: More NFS Stuff)

Thu Dec 7 01:12:51 UTC 2006

I released RTEMS NFS 1.4 which contains the patch
Steven submitted yesterday. Also, the global timeout
is now changeable as proposed.

Furthermore, 1.4 implements the 'st_blocksize' issue
discussed a while ago.

For more information and/or downloads see

http://www.slac.stanford.edu/~strauman/rtems/nfs

-- Till

Steven Johnson wrote:
> Hi,
>
> We were getting an application crash, using the NFS daemon.  It was due 
> to us changing time-outs, which exacerbated a potential race condition 
> in the RPC IO daemon.
>
> The details are:
>
> This is how we understand the NFS/RPC to work.
>
> In the NFS call.
>  Retrieve a XACT (transaction item) from a pool of XACTS. (There is a 
> message queue of these objects. If the message queue   is empty, create 
> a new object).
>  Set the timeouts, transaction ID and the ID of the calling thread in 
> the XACT and place a pointer to it into another message queue.
>  Send a TX_EVENT event to the RPC daemon.
>  Wait for an RPC event.
>  On receipt of the event, if the XACT is not marked as timed out process 
> the XACT input buffer and release the buffer.
> (This is where the code died because it was believed that if we got an 
> RPC event and the XACT was not timed out then it had a valid buffer)
>  put the XACT back into the XACT pool message queue.
>
> In the RPC daemon.
>
> Wait for RX or TX events.  RX Events are generated by callback from the 
> socket on receipt of data or timeout.
>
> TX_EVENT processing.
>    Stage One.
>    On receipt of a TX_EVENT, pull all XACTs in the message queue out of 
> the queue and stick them in a list of XACTs needing processing. Mark 
> them with trip set to "FIRST_TIME".
>
>    Stage Two
>    Ensure another list of the XACTs is empty. (newList)
>    Go through the list of XACTs built on receipt of TX_EVENTs
>       See if any of the XACTs has timed out (toLive < 0)
>          if so mark the XACT as timed out and send the RPC event to the 
> thread ID of the XACT. (Here is where we add,                   change 
> the XACTs transaction ID)
>      else
>          Send the output buffer of the XACT to the daemon's server. (If 
> the tx fails mark the XACT as failed and send a RPC                event 
> to the caller thread)
>          If the XACTs trip time is not FIRST_TIME, then this a a 
> retransmit so
>                adjust the retry_period keeping it below the maximum period.
>         Now set the trip, age and retry_period of the XACT.
>         Add the XACT to the head of the "newList".
>
>    Stage Three
>        Sort the newList by age       Go back and wait for events.
>
> RX Event processing.
>
>    Get data from the socket.
>    If there is data, extract the transaction ID (xid) from the data and 
> compare it to the xids stored in a hash table of the XACT objects (as 
> XACTs are created they are added to the hash table).
>        If we find an XACT in the hash table whose ID matches and also 
> the server address and port matches
>           Set the XACTs ibuf to the data we have received
>       Remove the XACT from out of the xact transaction list.
>       Change it's xid.
>       Recalculate server timeouts based on how long this one took.
>       Mark the XACT as rxed good.
>       Send the RPC event to the XACTs caller thread ID.
>
> And that's about it.
>
> The problem we had was that if the XACT timed out we sent the XACT back 
> to the caller marked as bad and the caller failed the read request. But 
> if the data still came back in the XACT still matched via the xid in the 
> hash table and the XACT was marked as good and had a buffer of data. The 
> next time the nfs call is processed however it grabs a new XACT and 
> sends it to the daemon and then waits on an RPC_EVENT but there is 
> already a pending event for that thread and so the new XACT is processed 
> but its buffer is invalid (NULL) and we got a DTLB miss processing the 
> crap data, and subsequently crashed.
>
> Following is a patch on rpcio.c with the change to fix the bug.
>
> --- rpcio.c    2006-11-20 16:50:29.000000000 +1000
> +++ rpcio.c    2006-12-04 17:30:53.633652216 +1000
> @@ -1263,6 +1263,11 @@
>          srv = xact->server;
>  
>          if (xact->tolive < 0) {
> +        /* change the ID - there might still be
> +         * a reply on the way. When it arrives we must not find it's ID
> +         * in the hashtable
> +         */
> +          xact->obuf.xid        += XACT_HASHS;
>            /* this one timed out */
>            xact->status.re_errno  = ETIMEDOUT;
>            xact->status.re_status = RPC_TIMEDOUT;
>
> We are also investigating adding some new functionality to the NFS server:
>
> 1. A function to return all of the NFS/RPC statistics kept by the 
> daemon, rather than just printing it out to a file.
> 2. A function to allow the default hardcoded timeouts to be changed at 
> runtime.  We find the current time-outs way too long.  For example in 
> our application NFS replies always within 100us so there is no point 
> waiting for 100's of milliseconds to timeout. (any problems with us 
> adding these 2 functions?)
> 3. NFS Read caching.  There are 2 options we identify, 1 is to make 
> rpcio daemon nfs-read aware and to handle the read-ahead caching in 
> there on nfs-read RPC calls.  The other way is to do it at the NFS 
> layer, but as the calls are synchronous, we would need multiple threads 
> to deal with the caching, without blocking the original caller and 
> defeating the point of caching.  Does anyone have any comments on these 
> 2 approaches, which one would be more acceptable?
>
> Steven J
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>