cygwin build_alias issue (and a possible workaround...)

Bogdan Vacaliuc bvacaliuc at ngit.com
Fri Sep 3 20:05:36 UTC 2004


Everyone,

So after more experiments, I must also finally conclude that the CSRSS heap
resources or its scheduling priority are not correlating with the failure
mode of this problem we are seeing.  [I have replaced my registry settings
to the original values, and abandoning this track and joining the 'expr'
hunt.]

Lets recap what we know.

0) The problem manifests itself on pc-cygwin build environments, not
linux-gnu build environments.
1) The problem is intermittent and occurs at different times during the
configure or the build process.
2) The problem is a (? spurious) failure return from a call to 'expr' in the
various configure scripts, in which $build_alias is involved as a variable
to the input to 'expr'.
3) If the offending 'expr' command line is executed outside of the
configure/build scripts it never fails.
4) If strace is used on the configure/build, the failure never occurs.

It is challenging to investigate this problem, since attemps at observing it
make the problem disappear, or occur in an 'unobserved' place.

I thought I would attempt to avoid using strace, while looking at the
inputs, outputs and status of all calls to 'expr'.  I created the following
script placed in ~/bin/expr

#!/bin/sh
if result=`/usr/bin/expr "$@"` ; then
    echo "expr $@ result $result status 0" >> ${HOME}/expr.log
    echo $result
    exit 0;
else
    echo "expr $@ result $result status !0" >> ${HOME}/expr.log
    echo $result
    exit 1;
fi

And making sure that ~/bin was in the path ahead of /usr/bin, ran the
configure and builds on both cygwin and linux-gnu.  An interesting
observation was that the existence of this script also causes the configure
and build scripts to execute without failure.

>>> Using this script in place of /usr/bin/expr causes my configure/build to
succeed *every time*. <<<

Although academically unsatisfying, this is (IMHO) an acceptable workaround
until the real culprit is found since the performance impact is small
compared to strace/strace --mask=minimal.

I am running a series of sequential configure/builds to confirm this.  So
far I have had 8 consecutive configure/build successes.

Scott, Gene, everyone, does this work for all of you and is it an acceptable
(temporary) workaround?


Can it be possible that the issue resides in (v)fork(), exec*, spawn()
called from bash?  As Ralf and Chris have suggested that somehow the
arguments are not making it to expr correctly?  This is difficult to know,
since with the above script in the way the problem disappears (again).
Still the whole copy-on-write of fork()/exec() for processes has been a
stickler for Cygwin in the past.

Sigh.  I'm tired of this thing.

-bogdan


P.s. I was interested to see the difference (if any) between the
cygwin/Linux runs in the ~/expr.log files.  After normalizing the build
system identifier, I compared the files from the builds (both were
successful).

[~] sed s/i686-pc-cygwin/i686-pc-normalized/g < expr.log >
expr_normalized_cygwin.log
[~] sed s/i686-pc-linux-gnu/i686-pc-normalized/g < /cygdrive/h/expr.log >
expr_normalized_linux.log
[~] diff expr_normalized_linux.log expr_normalized_cygwin.log
1a2,5
> expr + 1 result 1 status 0
> expr + 1 result 1 status 0
> expr 2 + 1 result 3 status 0
> expr a : \(a\) result a status 0
20a25,26
> expr a.exe : [^.]*\(\..*\) result .exe status 0
> expr conftest.exe : [^.]*\(\..*\) result .exe status 0
111a118,119
> expr + 1 result 1 status 0
> expr a : \(a\) result a status 0
695c703
< expr 102 + 1 result 103 status 0
---
> expr 4688 + 1 result 4689 status 0
[~]

Which is curious, since both configure/builds are based on the exact same
rtems source code.  I have these logs if anyone is interested in them.


P.p.s. Interesting reading on vfork bugs:
(http://www.wlug.org.nz/vfork%282%29,
http://www.google.com/search?sourceid=navclient&ie=UTF-8&q=%2Bvfork+%2Bcygwi
n+%2Bbug)




> -----Original Message-----
> From: Bogdan Vacaliuc [mailto:bvacaliuc at ngit.com]
> Sent: Thursday, September 02, 2004 10:51 AM
> To: joel.sherrill at OARcorp.com; 'Steve Holle'
> Cc: 'Ralf Corsepius'; 'Chris Johns'; 'RTEMS Users'
> Subject: RE: cygwin build_alias issue
> 
> 
> Joel, Everyone,
> 
> Here are my results so far (though nothing experimentally conclusive;
> grrr.):
> 
> Here is a reference to a post that boils down the issue
> (circa 2002): http://www.cygwin.com/ml/cygwin/2002-02/msg01068.html
> 
> I should have referenced (KB184802 in my earlier post, as it
> is more relevant to this hypothesis so far: 
> http://support.microsoft.com/default.aspx?scid=kb;EN-US;184802
> 
> That article suggests that user32.dll or kernel32.dll may
> fail to initialize (causing a process exit) for one of two 
> causes.  Cause 2 is the 'out of desktop heap' problem, which 
> we are considering as a possibility.  The failure is a that 
> the process exits with an exit code of 128 
> (ERROR_WAIT_NO_CHILDREN).  There are *three* values to the 
> SharedSection value in the registry that control the size and 
> allocation unit of each process regarding this 'desktop heap'.
> 
> It also says "Desktop heap is allocated by User32.dll when a
> process is in need of user objects. If an application is not 
> dependent on User32.dll, it will not consume desktop heap."
> 
> Using cygcheck on expr.exe, sed.exe and bash.exe shows that
> expr and sed only use kernel32.dll, but bash uses both 
> user32.dll and kernel32.dll.
> 
> When I first looked at my registry settings, they had the
> following values for SharedSection:
> 
> SharedSection=1024,3072,512,512
> 
> Notice that there were *four* entries, not two or three as
> the M$ articles describe.  (the fourth turns out to be limits 
> for terminal services as mentioned in msg01068), I made the 
> following modifications with the following observations:
> 
> SharedSection=1024,3072,128   // breaks in build every time
> SharedSection=1024,3072,1024  // runs with no apps on
> desktop, breaks in build with Outlook running
> SharedSection=1024,6144,512   // runs intermittently (as 
> before), breaks in
> configure as well as build
> SharedSection=1024,3072,2048  // configure failed with apps 
> on desktop (didn't do a complete test)
> 
> 
> The overriding item in all of this is that when strace is
> used, the build
> *never* breaks; it always succeeds; even strace 
> --mask=minimal.  What I am thinking is that the resources in 
> question are in fact 'released'; however, since CSRSS.EXE 
> actually manages the release, and since new processes get 
> started before CSRSS.EXE gets a chance to run, the processes 
> spawned by configure and make 'appear' to run out of these 
> resources and fail in unpredictable ways.
> 
> Perhaps increasing the scheduling priority of CSRSS.EXE?...
> 
> More anon...
> 
> -bogdan
> 
> 
> 
> > -----Original Message-----
> > From: Joel Sherrill <joel at OARcorp.com>
> > [mailto:joel.sherrill at OARcorp.com]
> > Sent: Wednesday, September 01, 2004 5:36 PM
> > To: Steve Holle
> > Cc: Ralf Corsepius; Chris Johns; RTEMS Users
> > Subject: Re: cygwin build_alias issue
> > 
> > 
> > Steve Holle wrote:
> > > <snip>
> > > 
> > > 
> > >> FWIW: Did somebody try the "usual windows resources test", i.e.
> > >> reboot, close all apps on the desktop, open a window and run the 
> > >> test, open another window and reiterate the test, continue
> > until the
> > >> system collapses or the test fails in a different way?
> > > 
> > > 
> > > Steve Strobel has tried rebooting and closing all apps.
> > Sometimes it
> > > works and sometimes it doesn't.  He has a pretty bare
> bones windows
> > > setup while I have lots of gadgets loaded in the system
> tray.  I've
> > > built rtems with a number of apps and sometimes other
> > cygwin windows
> > > open with no problem.
> > 
> > I have tried the registry suggestion and increased both numbers by
> > 1024. I had one good run, then one with the same failure.  
> So unless
> > the numbers need to be even larger, that isn't it.
> > 
> > It does seem rather spurious but I am noticing that the
> failure seems
> > to always be in tools/cpu/generic/configure for me.  Both
> before and
> > after the registry change.
> > 
> > --joel
> > 
> > 
> 
> 




More information about the users mailing list