Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedded NULL failure #36

Open
skaller opened this issue Sep 19, 2018 · 17 comments
Open

Embedded NULL failure #36

skaller opened this issue Sep 19, 2018 · 17 comments

Comments

@skaller
Copy link
Member

skaller commented Sep 19, 2018

Building Felix:

I haven't seen this before, I have changed the build process a bit.

It looks like a bug in Python.. but why didn't it happen before?

Appveyor CI:

set PATH=C:\Python35-x64;c:\ocaml\bin;%PATH%
python --version
Python 3.5.4

etermining platform : {'windows', 'win32'}
looking for cl.exe : ok C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64\cl.exe
checking cl.exe : Traceback (most recent call last):
File "fbuild\fbuild-light", line 11, in
main()
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\main.py", line 179, in main
result = build(ctx)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\main.py", line 104, in build
target.function(ctx)
File "C:\projects\felix-e3h83\fbuildroot.py", line 585, in build
phases = configure(ctx)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db_init_.py", line 121, in call
result, srcs, dsts = self.call(*args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db_init_.py", line 125, in call
return ctx.db.call(self.function, ctx, *args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\database.py", line 165, in call
call_result = function(*args, **kwargs)
File "C:\projects\felix-e3h83\fbuildroot.py", line 558, in configure
build = config_build(ctx)
File "C:\projects\felix-e3h83\fbuildroot.py", line 328, in config_build
flags=ctx.options.build_c_flags),
File "C:\projects\felix-e3h83\fbuildroot.py", line 258, in make_c_builder
static=call('fbuild.builders.c.guess_static', ctx, *args, **kwargs),
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\functools.py", line 32, in call
return import_function(function)(*args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c_init_.py", line 457, in guess_static
), *args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c_init_.py", line 394, in guess_builder
return fbuild.functools.call(function, ctx, exe, *args, **new_kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\functools.py", line 32, in call
return import_function(function)(*args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 531, in static
compiler=Compiler(ctx, Cl(ctx, **kwargs),
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db_init
.py", line 68, in call
*args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\database.py", line 165, in call
call_result = function(*args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db_init_.py", line 64, in call_super
return super().call(*args, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 38, in init
if not self.check_flags(flags):
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 116, in check_flags
self([src], flags=flags, quieter=1, cwd=src.parent)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 103, in call
return self.ctx.execute(cmd, msg2=msg2, **kwargs)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\context.py", line 214, in execute
**kwargs)
File "C:\Python35-x64\lib\subprocess.py", line 676, in init
restore_signals, start_new_session)
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\subprocess\killableprocess.py", line 182, in _execute_child
winprocess.EnvironmentBlock(env),
File "C:\projects\felix-e3h83\fbuild\lib\fbuild\subprocess\winprocess.py", line 129, in init
self.as_parameter = LPCWSTR("\0".join(values))
ValueError: embedded null character
NMAKE : fatal error U1077: 'C:\Python35-x64\python.EXE' : return code '0x1'
Stop.

@refi64
Copy link
Contributor

refi64 commented Sep 19, 2018

This was a Python bug that's been fixed in both CPython and Fbuild. I'll put the fix in Felix's copy later!

@skaller
Copy link
Member Author

skaller commented Sep 20, 2018

I tried the code in fbuild. That fails too! Using this:

            self._as_parameter_ = LPCWSTR("\0".join(values).encode("utf-16le"))

I get

 File "C:\projects\felix\fbuild\lib\fbuild\subprocess\winprocess.py", line 129, in __init__
    self._as_parameter_ = LPCWSTR("\0".join(values).encode("utf-16le"))
TypeError: unicode string or integer address expected instead of bytes instance
NMAKE : fatal error U1077: 'C:\Python35-x64\python.EXE' : return code '0x1'

I also tried that latest fbuild as a whole, and that fails much earlier
with "deprecated API". It was talking about the command line parsing.
Not sure why a deprecated API would cause a bomb, but it did.

What bothers me is that this all worked until the latest major commit.
See:

https://ci.appveyor.com/project/skaller/felix/history

The winprocess.py code has not changed. Interscript was removed.
The configuration is no longer generated, a predefined config is now
just copied. So there are some extra copies done by scanning certain
subdirectories of src/config.

@refi64
Copy link
Contributor

refi64 commented Sep 20, 2018

LPCWSTR("\0".join(values).encode("utf-16le"))

You mixed up the fix a bit... It should be LPCSTR, not LPCWSTR.

AppVeyor's Python likely got upgraded to a version that has the bug.

@skaller
Copy link
Member Author

skaller commented Sep 20, 2018

Ooops...:-) Thanks! Will report back. I'm using Python 3.5 on Appveyor.

@skaller
Copy link
Member Author

skaller commented Sep 20, 2018

Well with the fbuild version I now get this:

checking cl.exe          : command failed: 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64\cl.exe' /nologo /Ox /IC:\projects\felix-e3h83\tmp\tmp4gqiagkb C:\projects\felix-e3h83\tmp\tmp4gqiagkb\temp.c
Traceback (most recent call last):
  File "fbuild\fbuild-light", line 11, in <module>
    main()
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\main.py", line 179, in main
    result = build(ctx)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\main.py", line 104, in build
    target.function(ctx)
  File "C:\projects\felix-e3h83\fbuildroot.py", line 585, in build
    phases = configure(ctx)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\__init__.py", line 121, in __call__
    result, srcs, dsts = self.call(*args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\__init__.py", line 125, in call
    return ctx.db.call(self.function, ctx, *args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\database.py", line 165, in call
    call_result = function(*args, **kwargs)
  File "C:\projects\felix-e3h83\fbuildroot.py", line 558, in configure
    build = config_build(ctx)
  File "C:\projects\felix-e3h83\fbuildroot.py", line 328, in config_build
    flags=ctx.options.build_c_flags),
  File "C:\projects\felix-e3h83\fbuildroot.py", line 258, in make_c_builder
    static=call('fbuild.builders.c.guess_static', ctx, *args, **kwargs),
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\functools.py", line 32, in call
    return import_function(function)(*args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\__init__.py", line 457, in guess_static
    ), *args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\__init__.py", line 394, in _guess_builder
    return fbuild.functools.call(function, ctx, exe, *args, **new_kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\functools.py", line 32, in call
    return import_function(function)(*args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 531, in static
    compiler=Compiler(ctx, Cl(ctx, **kwargs),
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\__init__.py", line 68, in __call__
    *args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\database.py", line 165, in call
    call_result = function(*args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\db\__init__.py", line 64, in __call_super__
    return super().__call__(*args, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 38, in __init__
    if not self.check_flags(flags):
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 116, in check_flags
    self([src], flags=flags, quieter=1, cwd=src.parent)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\builders\c\msvc.py", line 103, in __call__
    return self.ctx.execute(cmd, msg2=msg2, **kwargs)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\context.py", line 234, in execute
    raise e from e
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\context.py", line 214, in execute
    **kwargs)
  File "C:\Python35-x64\lib\subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\subprocess\killableprocess.py", line 184, in _execute_child
    startupinfo)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\subprocess\winprocess.py", line 159, in ErrCheckCreateProcess
    ErrCheckBool(result, func, args)
  File "C:\projects\felix-e3h83\fbuild\lib\fbuild\subprocess\winprocess.py", line 38, in ErrCheckBool
    raise WinError()
OSError: [WinError 87] The parameter is incorrect.
NMAKE : fatal error U1077: 'C:\Python35-x64\python.EXE' : return code '0x1'
Stop.
Command exited with code 2

@skaller
Copy link
Member Author

skaller commented Sep 23, 2018

Help!! :)

@refi64
Copy link
Contributor

refi64 commented Sep 26, 2018

@skaller Ahhhhh I hate Windows's Unicode handling. 😠

Try this:

self._as_parameter_ = LPCSTR("\0\0".join(values).encode("utf-16le"))

Notice the change from \0 to \0\0.

@skaller
Copy link
Member Author

skaller commented Sep 26, 2018

The problem is Python, not Windows :) Error checking text strictly is a stupid default.
The default should be "it always works, even if the result is a bit nasty" because its
just human readable script. Its still readable even if a few characters are wrong.

@skaller
Copy link
Member Author

skaller commented Sep 26, 2018

Well that fixed the problem I think but now there's another one:

https://ci.appveyor.com/project/skaller/felix/build/1.0.1638

configuring build phase
determining platform     : {'win32', 'windows'}
looking for cl.exe       : ok C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\amd64\cl.exe
checking cl.exe          : failed
looking for clang.exe    : ok C:\Program Files\LLVM\bin\clang.exe
checking clang.exe       : failed
looking for icc.exe      : failed
looking for icc.cmd      : failed
looking for icc.bat      : failed
looking for icc          : failed
looking for gcc.exe      : failed
looking for gcc.cmd      : failed
looking for gcc.bat      : failed
looking for gcc          : failed
looking for cc.exe       : failed
looking for cc.cmd       : failed
looking for cc.bat       : failed
looking for cc           : failed
cannot find a c static builder for frozenset({'gcc', 'icc', 'win32', 'windows', 'clang'})
NMAKE : fatal error U1077: 'C:\Python36-x64\python.EXE' : return code '0x1'
Stop.

The compiler found appears wrong .. it should be the full 64 bit compiler but isn't.
That's VS 2017 not 2015 now. I would think Appveyor has the 64 bit compiler.
I mean, the 32 bit compiler targeting 64 bits is OK, but a lot slower.
Anyhow the test that it works is failing.

@refi64
Copy link
Contributor

refi64 commented Sep 27, 2018

@skaller Can you have AppVeyor type build\host\fbuild.log to print it (or wherever the path is for Felix because I can't remember)? Failing that I'll take a look at fbuild's appveyor and see if it shows anything of interest.

@skaller
Copy link
Member Author

skaller commented Oct 9, 2018

I'm not sure how to do that. However I can run on my Windows box. But I am guessing the problem is in the Python interface. Windows works JUST FINE with 8 bit data. Felix can do things in windows with popen emulation. Why the Python code tries to complex and irrelevant UTF-16 encode/decode crap i don't know.

I am guessing the "null" byte in the string is there because in fact the string is a bad attempt to create main's argv which is an array or null terminates strings.

@skaller
Copy link
Member Author

skaller commented Oct 9, 2018

OK so on my windows box:

checking cl.exe          : 
MainThread: starting "'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe' /nologo /Ox /IC:\\Users\\skaller\\AppData\\Local\\Temp\\tmp4cb2dw00 C:\\Users\\skaller\\AppData\\Local\\Temp\\tmp4cb2dw00\\temp.c"
 + 'C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\BIN\x86_amd64\cl.exe' /nologo /Ox /IC:\Users\skaller\AppData\Local\Temp\tmp4cb2dw00 C:\Users\skaller\AppData\Local\Temp\tmp4cb2dw00\temp.c
 - exit 3221225781, 2.06 sec

Now, I tried this manually:

C:\cygwin64\home\skaller\felix>"C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\cl.exe"
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.23026 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

usage: cl [ option... ] filename... [ /link linkoption... ]

which shows cl.exe is there and it works fine.

The shown command with arguments does NOT work, the quoting is all wrong.
If i fix the quoting i get "no such file or directory" for the C file which is expected,
the temporary file is probably deleted automatically when the process ended.

So the problem is bad quoting.

@skaller
Copy link
Member Author

skaller commented Oct 9, 2018

So .. FYI in https://github.com/felix-lang/felix/blob/master/src/packages/program.fdoc you will find around line 1018 the rules for quoting on Windows for CMD.EXE. Roughly it is appears simple enough:

 instance Shell_class[Win32, Win32Process::process_status_t]
  {
    fun quote_arg(s:string):string => '"' + s + '"';
    fun quote_line_for_system(s:string) => '"' + s + '"';
    fun quote_line_for_popen(s:string) => '"' + s + '"';

    gen raw_get_stdout(x:string) = {
      //eprintln("CMD.EXE: raw_get_stout of " + x);
      var fout = Win32Process::popen_in(x);
      if valid fout do
        var output = load fout;
        var result = Win32Process::pclose fout; 
        return Win32Process::int_of_process_status_t result, output;
      else
        println$ "Unable to run command '" + x "'";
        return -1,"";
      done
    }

and the base type class says:

 virtual fun quote_arg:string->string;
  fun quote_args (s:list[string]) : string => catmap[string] ' ' quote_arg s;

In other words you just put "" around each argument, separate them by a space, and then quote the whole string again with "". If you have sloshes in the arguments DO NOT DOUBLE THEM UP. you do have to double up the sloshes in C or Felix string literals because the language lexer will replace them but not if you're constructing the strings.

Not sure if this helps. It took Shayne Fletcher and myself about 6 months working over Facebook with me on OSX and him on Windows to get this to actually work. It looks and is simple but it took a long time to figure out the rules. What you type at a CMD.EXE prompt is NOT the same as what you supply to system() or popen() the latter require the extra quote of the whole string.
If you do not follow these rules, for example if you fail to quote the individual arguments, it won't work because if CMD.EXE called by system() or popen() sees a leading quote it throws it away. You need the arguments to remain quoted. It weird. But the code above is USED by Felix build system to call MSVC and it works.

@skaller
Copy link
Member Author

skaller commented Oct 9, 2018

BTW: on windows the problem is that programs do NOT get separate arguments as they do on Unix. On WIndows, which used to be MSDOS, programs get a single string and have to parse the string themselves.

What happens is in CMD.EXE, it finds the program name and calls it passing the WHOLE string to it. Now C programs requires main(argc, argv) so the startup code for C has the command line parsing it it. This ONLY applies to console programs! Similarly stdin/stdout are connected to the console in the startup code.

So when you call system() or popen() they start a CMD.EXE instance and pass to it the string argument WITH THE WRAPPER QUOTES REMOVED. So the string you use is not the same as when using the CMD.EXE prompt! The the C startup quote parses that to find the argv arguments, again stripping out the argument quotes if present. The tricky bit is that the whole string quote marks are removed, so if they're not there and the first argument is quoted .. the ARGUMENT quote is removed by mistake, and the end of argument quote then quotes the rest of the string by mistake and so it gets treated as a single argument!

Or something like that. Its a mess. The problem is Unix and C. What MSDOS did was actually more sensible. Actually if you glob commands to CMD.EXE it does nothing: the program has to decode the globs. In Unix shells the shell decodes the globs.

In Felix this is a right pain because I had to pick whether globs are unpacked or not because there is only ONE "system()" method and you really need two, not just for globs but other shell expansions. In Unix it depends on how you do the quoting whether the the shell expands the argument or not: double quoted arguments are expanded, single quoted ones aren't. Its a pain because of multiple layers of quote stripping. Its a pain in plain bash for the same reason. You have to calculate how many levels of (double up) quotes you need based on how many quote stripping passes you go through because you actually dispatch to the process you want (since the shell can contains, say, parens which spawn sub-shells ... ARGGGH) :)

@refi64
Copy link
Contributor

refi64 commented Oct 10, 2018

I'm not sure how to do that. However I can run on my Windows box. But I am guessing the problem is in the Python interface. Windows works JUST FINE with 8 bit data. Felix can do things in windows with popen emulation. Why the Python code tries to complex and irrelevant UTF-16 encode/decode crap i don't know.

The Python code uses CreateProcessW (the wide-char) version. Not exactly sure why, but the original Mozilla killableprocess code was like that.

A quoting issue is actually odd. Python should properly quote it, and the quoting issue is probably just from printing it... The test tries compiling an actual C program. Could be an environment issue or something.

I. Really. Hate. Killableprocess.py.

@skaller
Copy link
Member Author

skaller commented Oct 10, 2018

The diagnostic when the compiler is invoked by hand says the file can't be found, because its a temporary already gone. In other words .. the command works fine when i use it manually with the correct quoting. So its not an environment issue uinless the process is started without inheriting the environment. The build system is run in a terminal with MSVC already set up.

The thing is .. the original code USED TO WORK. It just died somehow for no apparent reason .. I switched from VS2015 to VS 2017 in Appveyor but not on my local Windows box and it fails there the same way.

@skaller
Copy link
Member Author

skaller commented Oct 16, 2018

Ok so I fixed it.

Here's how: I DELETED fbuild's subprocess directory and changed the reference
in context.py to just import the system subprocess.

Lo, it works. On OSX and on WIndows and hopefully Linux.
The Appveyor build is running as I write.

It seems that the fbuild subprocess was added so that a broken version of Python
would work. But it actually breaks the system on other versions of Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants