RTEMS-Users Need help: Race condition in ATA.c

Thomas Doerfler Thomas.Doerfler at embedded-brains.de
Fri Oct 6 09:41:34 UTC 2006


currently we are experiencing a severe problem with libchip/ide/ata.c.
In rare conditions, a block transfer to/from disk is started twice,
which leads to an immediate hangup of the ata subsystem.

We are using RTEMS- with a bunch of patches (including the
memory barrier stuff) on a MPC5200 PowerPC platform and GCC4.0.1 and

I have tracked down the problem to two functions. Here are the details:

Function 1: ata_add_to_controller_queue:
static void
ata_add_to_controller_queue(rtems_device_minor_number  ctrl_minor,
                            ata_req_t                 *areq)
    Chain_Append(&ata_ide_ctrls[ctrl_minor].reqs, &areq->link);
    if (Chain_Has_only_one_node(&ata_ide_ctrls[ctrl_minor].reqs))

        ata_queue_msg_t msg;
        ATA_SEND_EVT(msg, ATA_MSG_PROCESS_NEXT_EVT, ctrl_minor, 0);

Function 2:
static inline void
ata_request_done(ata_req_t *areq, rtems_device_minor_number ctrl_minor,
                 rtems_status_code status, int error)
    preemption_key key;


    ATA_EXEC_CALLBACK(areq, status, error);

    if (!Chain_Is_empty(&ata_ide_ctrls[ctrl_minor].reqs))

Short description:

ata_add_to_controller_queue is called (from within libblock/bdbuf.c) in
the context of the task, which issues a request to the ata driver. It
adds a new data transfer request to the request queue and, if the
request queue was empty, it triggers the ATA task to wake up and handle it.

ata_request_done is called in the context of the ATA task, whenever a
request has been completely processed. Then it removes the data transfer
request from the request queue, checks whether there are further
requests to process (!Chain_Is_empty) and, if yes, immediately starts
processing the next one.

What we have found out is, that in rare conditions it seems that:

1.) ata_request_done extracts the last element from the reuqest chain

2.) then it gets preempted (!!!) and another task calls
add_to_controller_queue, which in turn adds a new request to the queue

3.) and it also sends a trigger to the ATA task to process this request

4.) when the ATA task gets processed again, handles the rest of the
"ata_request_done" code, finds a request in the chain and handles/starts
it directly.

5.) then, in the ATA main function gets the trigger to look into the
request chain, finds the (already started) data transfer request and
start processing it again which leads to the disaster.

I have found various weaknesses in the two functions:

A) add_to_controller_queue does not disable preemption, although this
would be the conforming way to ensure cherent accesses to the reuqest
chain. BUT: when this function is called, preemption is currently always
disabled (I have checked this with an added test function)

B) The chain data structure is not volatile, although it is accessed
from two/multiple tasks. BUT: I have inspected the created assembler
code, and it seems correct. The chain accesses are not moved out of the
"preemption prison" in "ata_request_done"

C) The code is not very robust. BUT: It should work.

So now here I am. What I have proven with test variables is, that
"add_to_controller_queue" is called, while the code in
"ata_request_done" is inside the "preemption prison", which can not
happen, because this should be avoided by the non-preemtive mode of the
tasks involved.

Does anybody have an idea, what might go wrong? How can it be that the
ata task gets preempted in a section of code, where preemption is disabled?


embedded brains GmbH
Thomas Doerfler           Obere Lagerstr. 30
D-82178 Puchheim          Germany
Tel. : +49-89-18 90 80 79-2
Fax  : +49-89-18 90 80 79-9
email: Thomas.Doerfler at embedded-brains.de
PGP public key available on request

More information about the users mailing list