NFS 1.4 released (was Re: More NFS Stuff)
Till Straumann
strauman at slac.stanford.edu
Thu Dec 7 01:12:51 UTC 2006
I released RTEMS NFS 1.4 which contains the patch
Steven submitted yesterday. Also, the global timeout
is now changeable as proposed.
Furthermore, 1.4 implements the 'st_blocksize' issue
discussed a while ago.
For more information and/or downloads see
http://www.slac.stanford.edu/~strauman/rtems/nfs
-- Till
Steven Johnson wrote:
> Hi,
>
> We were getting an application crash, using the NFS daemon. It was due
> to us changing time-outs, which exacerbated a potential race condition
> in the RPC IO daemon.
>
> The details are:
>
> This is how we understand the NFS/RPC to work.
>
> In the NFS call.
> Retrieve a XACT (transaction item) from a pool of XACTS. (There is a
> message queue of these objects. If the message queue is empty, create
> a new object).
> Set the timeouts, transaction ID and the ID of the calling thread in
> the XACT and place a pointer to it into another message queue.
> Send a TX_EVENT event to the RPC daemon.
> Wait for an RPC event.
> On receipt of the event, if the XACT is not marked as timed out process
> the XACT input buffer and release the buffer.
> (This is where the code died because it was believed that if we got an
> RPC event and the XACT was not timed out then it had a valid buffer)
> put the XACT back into the XACT pool message queue.
>
> In the RPC daemon.
>
> Wait for RX or TX events. RX Events are generated by callback from the
> socket on receipt of data or timeout.
>
> TX_EVENT processing.
> Stage One.
> On receipt of a TX_EVENT, pull all XACTs in the message queue out of
> the queue and stick them in a list of XACTs needing processing. Mark
> them with trip set to "FIRST_TIME".
>
> Stage Two
> Ensure another list of the XACTs is empty. (newList)
> Go through the list of XACTs built on receipt of TX_EVENTs
> See if any of the XACTs has timed out (toLive < 0)
> if so mark the XACT as timed out and send the RPC event to the
> thread ID of the XACT. (Here is where we add, change
> the XACTs transaction ID)
> else
> Send the output buffer of the XACT to the daemon's server. (If
> the tx fails mark the XACT as failed and send a RPC event
> to the caller thread)
> If the XACTs trip time is not FIRST_TIME, then this a a
> retransmit so
> adjust the retry_period keeping it below the maximum period.
> Now set the trip, age and retry_period of the XACT.
> Add the XACT to the head of the "newList".
>
> Stage Three
> Sort the newList by age Go back and wait for events.
>
> RX Event processing.
>
> Get data from the socket.
> If there is data, extract the transaction ID (xid) from the data and
> compare it to the xids stored in a hash table of the XACT objects (as
> XACTs are created they are added to the hash table).
> If we find an XACT in the hash table whose ID matches and also
> the server address and port matches
> Set the XACTs ibuf to the data we have received
> Remove the XACT from out of the xact transaction list.
> Change it's xid.
> Recalculate server timeouts based on how long this one took.
> Mark the XACT as rxed good.
> Send the RPC event to the XACTs caller thread ID.
>
> And that's about it.
>
> The problem we had was that if the XACT timed out we sent the XACT back
> to the caller marked as bad and the caller failed the read request. But
> if the data still came back in the XACT still matched via the xid in the
> hash table and the XACT was marked as good and had a buffer of data. The
> next time the nfs call is processed however it grabs a new XACT and
> sends it to the daemon and then waits on an RPC_EVENT but there is
> already a pending event for that thread and so the new XACT is processed
> but its buffer is invalid (NULL) and we got a DTLB miss processing the
> crap data, and subsequently crashed.
>
> Following is a patch on rpcio.c with the change to fix the bug.
>
> --- rpcio.c 2006-11-20 16:50:29.000000000 +1000
> +++ rpcio.c 2006-12-04 17:30:53.633652216 +1000
> @@ -1263,6 +1263,11 @@
> srv = xact->server;
>
> if (xact->tolive < 0) {
> + /* change the ID - there might still be
> + * a reply on the way. When it arrives we must not find it's ID
> + * in the hashtable
> + */
> + xact->obuf.xid += XACT_HASHS;
> /* this one timed out */
> xact->status.re_errno = ETIMEDOUT;
> xact->status.re_status = RPC_TIMEDOUT;
>
> We are also investigating adding some new functionality to the NFS server:
>
> 1. A function to return all of the NFS/RPC statistics kept by the
> daemon, rather than just printing it out to a file.
> 2. A function to allow the default hardcoded timeouts to be changed at
> runtime. We find the current time-outs way too long. For example in
> our application NFS replies always within 100us so there is no point
> waiting for 100's of milliseconds to timeout. (any problems with us
> adding these 2 functions?)
> 3. NFS Read caching. There are 2 options we identify, 1 is to make
> rpcio daemon nfs-read aware and to handle the read-ahead caching in
> there on nfs-read RPC calls. The other way is to do it at the NFS
> layer, but as the calls are synchronous, we would need multiple threads
> to deal with the caching, without blocking the original caller and
> defeating the point of caching. Does anyone have any comments on these
> 2 approaches, which one would be more acceptable?
>
> Steven J
> _______________________________________________
> rtems-users mailing list
> rtems-users at rtems.com
> http://rtems.rtems.org/mailman/listinfo/rtems-users
>
More information about the users
mailing list