cygwin build_alias issue (and a possible workaround...)

Mon Sep 6 16:51:38 UTC 2004

Gene, Everyone,

> -----Original Message-----
> From: Gene Smith
> Sent: Saturday, September 04, 2004 3:52 PM
>
> Maybe cygwin "bash" is really somehow "ash" internally and falsely 
> identifying itself as "bash" with --version. Is there any way 
> to verify 
> that? (Just a wild theory :-), keep up the good work!)

I have downloaded and compiled the sources to both ash (ash-20040127-1) and
bash (bash-2.05b with all 7 official patches applied).

ftp://sources-redhat.mirror.redwire.net/pub/sources.redhat.com/cygwin/releas
e/ash/
ftp://aeneas.mit.edu/pub/gnu/bash/

They are different.  ash is ash and bash is bash.

I enabled the TRACE in ash, however the output traces a lot of things, and
with multiple processes, it becomes tedious to sort it all out.  I did,
though and got the error with expr.log and built-in trace from ash.

Surprise!  The expr command that fails is not executed under ash, rather
/bin/bash:

configure: running /bin/bash
'/home/bvacaliuc/projects/rtems/rtems-4.6.1-jtm-20040815/c/src/../../cpukit/
libblock/configure' ...

Well, it shouldn't have been a surprise, it was right there in the log
output.  Anyway, bash isnt setup for tracing so easily; however I thought I
would verify whether it was really a shell issue, or as I suspect something
lower level.  I took Chris John's test-fail/test-pass/test script concept
and realized that all one had to do was call the script failure line over
and over until you were satsified that it worked or it failed.

Here is a script to do that:

#!/bin/sh
iter=0
while echo "*** TEST ${iter} ***" &&
      ./fail.sh > ./fail.log 2>&1 ; do

      iter=`expr $iter + 1`;
done
echo "*** TEST ${iter} FAILS ***"
tail ./fail.log
exit 1

What you put into fail.sh is simply the command line from the failed output
log, and just edit out the "configure: running" part:

-- fail.sh --
/bin/bash
'/home/bvacaliuc/projects/rtems/rtems-4.6.1-jtm-20040815/c/src/../../cpukit/
libblock/configure' ...

So running that, shows bash failing after some small number of iterations.
I changed /bin/bash to /bin/sh to see if it was a bash-specific bug.  Using
/bin/sh in fail.sh, still generates the error after some amount of
iterations.

The expectation is that these configure scripts could run forever without
failure.

One curious thing I noticed, after I had recompiled my sh.exe and bash.exe's
is that the failure now occurs on any one of the 'expr' pipelines in any of
the configure scripts.  For example, my error above has:

configure: error: invalid package name: target-subdir

Its no longer isolated to the build_alias name.  The corresponding expr.log
output shows:

expr xtarget-subdir :
.*[^-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789] result 0 status !0

So we have a test methodology in so far as the rtems source tree is
concerned.  Basically, expr exits with a non-zero status and the shell
interprets it as a zero status return.

In looking at the TRACE of ash, and instrumenting the modifies of exitstatus
(which is a global, by the way...) I caught some with the following
behavior:

normal command:  expr xtarget-subdir :
.*[^-_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789] 
eval.c(946): exitstatus = waitforjob() (1)
eval.c(662): oexitstatus = exitstatus (1)
eval.c(665): exitstatus = 0
eval.c(584): exitstatus = 0
eval.c(662): oexitstatus = exitstatus (0)eval.c(662): oexitstatus =
exitstatus (0)
eval.c
eval.c(665): exitstatus = 0
(665): exitstatus = 0

These were taken from a build run and in which I added tracing only the
modifications of exitstatus and oexitstatus (some 40 odd locations in the
code).  I am uncomfortable with these results because it shows multiple
threads calling the logging routines (see the character output collisions).
A process trace of sh.exe shows that each invokation has 4 threads running.

The trace logging in cases of variable output makes multiple calls to
fputs(), etc, which although individually thread-safe show multiple threads
modifying exitstatus during execution.  The situation in bash is similar, as
there is a global 'last_command_exit_value' with fewer write references to
it.

[At this point, I feel a need to justify why I went to the next step and
tried bash-3.0.  On the whole, it is not a pleasant thing to try and debug
an intermittent problem such as this; especially given that debugging the
build system shells is no where near being on my list of important things to
do.  So, since there was another easy door to go through, I took it.]

Next, I compiled bash-3.0 (same place for sources) and put that in place of
/bin/bash.exe [/usr/bin/bash.exe is a hard link] and ran the (reduced) test
script above again.  It has run past 35 iterations so far with success.
Hmm.  Could be interesting.

So, as the next step, would Gene and Scott please download the source to
bash-3.0, compile it and put it on their systems and run the rtems
configure-build scripts again?  If you are impatient and trust me and my
server, you can get my pre-compiled binary for cygwin from here:

http://ngit.com/blog/csb350/bash.exe.gz

[~/projects/cygwin/bash-3.0] md5sum bash.exe.gz
7247d65a6acf2d71bde14c127c677edb *bash.exe.gz

I have to stop here today.  It would have been lovely to come up with a more
generic test case that exersized the *sh issue; however, this is a team
effort, right?  :)  I will start my automated configure-build-redo script
again and see what comes of it tomorrow morning.  It will also be
interesting to see what are peoples results from this...

Best Regards,

-bogdan