Not quite (Cygwin build_alias bug hunt)

Bogdan Vacaliuc bvacaliuc at ngit.com
Sat Sep 18 17:26:02 UTC 2004


Hello Scott, Everyone,

[I was composing this post, and I just saw your message, so I figured I would send it out now in reference to Scott's latest]

First of all, I want to thank Scott, Gene and Wolfram for taking the time to run previous tests and provide feedback.  And of course
Pierre Humblet for debugging bash and making the patch.

As you may know from the discussions on the cygwin list, the current thinking on the problem revolves around the way PIDs are
handled in the bash source.

Pierre has analyzed the bash source and has provided a patch for the cygwin-version of bash 2.05b:

http://cygwin.com/ml/cygwin/2004-09/msg00882.html

See the end of this post for details on building the official cygwin-version of bash with this patch.  For the impatient, I have
made his pre-built bash.exe available here:
http://www.ngit.com/blog/csb350/bash.exe
$ md5sum bash.exe
9c69ff2b1c39329c48b7aba8ba9f70a4 *bash.exe

NOTE: It is not necessary to use the snapshot of the cygwin1.dll, in fact there may be reasons to avoid it.  This is a bash patch
using the stock distribution of the cygwin1.dll [you can select 'reinstall' in setup.exe to get back the stock version].

Using this shell, I have been able to perform over 500 iterations of the test script (in two separate chroot jails).  I have also
performed two simultaneous configure/builds of rtems for mips-csb350 and m68k (no bsp, according to Scott Newell's posted command
line), while at the same time running the test script.

I should also mention, that I haven't played any games with sh.exe and bash.exe in my setup.  Since configure will automatically
chooses bash.exe to run the salient parts.

I think the next step for rtems-users is to use a bash built with this patch on their systems and re-run the tests and build
environments.

[Scott, there may be configuration issues hampering the use of the snapshot and/or code compiled from vanilla bash (see below),
please follow the build instructions for the cygwin-version of bash.]

---

Pierre demonstrated how the code in bash-2.05b/execute_cmd.c::execute_command_internal() around line 650 would fail to wait on a
child process if the forked process' pid (forked in execute_simple_command() and written to the global last_made_pid) matched.  It
was possible to get a match on pid/last_made_pid since (among other things) new PIDs are created during "back tick" evaluations
(those are the PIDs created with a (null) command line).

His solution was to add a sequence number to each pid and qualify the test.

The RECYCLES_PIDS switch as applied to the cygwin build was removed in the patch, as the above qualification supercedes the code
which was activated by that switch.

---

In more recent developments on the cygwin1.dll: Chris made the number of pids held in the .dll to prevent reuse, parametric.
Extending this number (which was part of the experiment in the snapshots of 09/14/04) reduced (but did not eliminate) the likelyhood
of pid reuse.

---

Regarding the 'hang' problems that people are reporting.  I have observed this even before I applied the cygwin1.dll snapshot.  In
all cases, it involved some 'unofficial' grouping of either the cygwin1.dll, bash.exe, etc.  [I was compiling and using vanilla bash
sources instead of the sources distributed with cygwin].  Certainly the cygwin1.dll from the snapshot, when applied to a system with
much older revisions for the supporting programs may create strange issues.]

Having said that, I experienced one case more recently, in which the test case 'hung' after ~220 iterations.  Unfortunately, I did
not have the presence of mind to attach gdb to the process, and though I sent a SEGV, I did not get a core (sigh).

In anycase, my feeling is that this 'hang' is unrelated to the bash/PID/configure error we have been chasing.

---

We still have not gotten any test reports from WinXP users (except Chris Johns who originally reported the behavioral difference).
>From WinNT, the PID reuse period does not match that in Win2K, so in some reports the test script did not fail.  I'm interested in
the WinXP item, because I'm watching for signs of that M$ bug I had mentioned previously.  So far there is no evidence...

---

Once again, thanks goes out to all the people who have worked on this problem across the two lists.

-bogdan



p.s. To apply the above patch the following steps are necessary:

1) obtain the bash source via the cygwin setup.exe program (check the 'source' box).  This gives you a tarball and .sh script in
/usr/src.
2) Expand the sources:

	$ cd /usr/src
	$ ./bash-2.05b-16.sh prep

3) Apply the patch (assuming you saved the patchfile in your ~)

	$ ( cd bash-2.05b ; patch < ~/pids.diff )

4) Configure and build bash

	$ ./bash-2.05b-16.sh conf
	$ ./bash-2.05b-16.sh build

The resulting bash executable will be found in /usr/src/bash-2.05b/.build/bash.exe, but it will need to be stripped and placed in
/bin/bash.exe manually.

	$ ( cd bash-2.05b ; make strip )


> -----Original Message-----
> From: Scott Newell [mailto:newell at cei.net] 
> Sent: Saturday, September 18, 2004 1:18 PM
> To: rtems-users at rtems.com
> Subject: Not quite (Cygwin build_alias bug hunt)
> 
> 
> I tried Bogdan's suggestion of building bash from source.  
> Grabbed the bash-2.05b source and patches from:
> 
> ftp://aeneas.mit.edu/pub/gnu/bash/
> 
> and another promising looking patch mentioned on the Cygwin list:
> 
> http://www.cygwin.com/ml/cygwin/2004-09/msg00882.html
> 
> While the last patch didn't apply cleanly (something's 
> changed in the configure script, I guess), the C source mods 
> went in.  
> 
> $ patch < ../../bash_patch/diff.txt
> patching file configure
> Hunk #1 FAILED at 19323.
> 1 out of 1 hunk FAILED -- saving rejects to file 
> configure.rej patching file execute_cmd.c patching file 
> jobs.c patching file jobs.h patching file subst.c Hunk #1 
> succeeded at 3450 (offset 1 line). Hunk #2 succeeded at 3715 
> with fuzz 1 (offset 1 line).
> 
> I then tried Bogdan's test--ran a few hundred iterations, 
> then hung on my work machine.  Oh, this machine is still 
> running the 9-14 snapshot cygwin1.dll, which has helped.
> 
> 
> --
> newell
> 
> 




More information about the users mailing list