Using an Island console application as init process

IDE: Water 10.0.0.2331
Version: Version 10.0.0.2331 (develop) built on talax, 20180928-142533. Commit 8e4eb02
Target: Island (Linux)
Description: An Island console application, if run as Linux init process, hangs without (apparently) even entering Main (or whatever entry point is defined).

Expected Behavior: As long as the application code does not rely on conditions which are not true at the time the init process starts (root filesystem being writable, proc and sys filesystems in place, etc.) it should be able to at least use Linux system calls via rtl.

Actual Behavior: Screen cleared. Flashing cursor. Forever.

Steps:
I’ll outline the steps to reproduce this behavior on a Raspberry Pi 2B, 3B, or 3B+.

  • Download the latest Raspbian Stretch Lite image from here
  • Write the image to a MicroSD card, using Win32 Disk Imager or similar program of your choice
  • Create a new C# Island project in Water, using the “Console Application (Linux)” template
  • Build for armv6 architecture
  • Copy the resulting executable ConsoleApplication to the VFAT partition of the Raspbian MicroSD card
  • Open cmdline.txt with a text editor, perform the following replacements, save and exit:
    • replace root=PARTUUID=ee25660b-02 rootfstype=ext4
      with root=PARTUUID=ee25660b-01 rootfstype=vfat
    • replace init=/usr/lib/raspi-config/init_resize.sh
      with init=/ConsoleApplication
  • Safely disconnect the MicroSD card and put it in the Raspberry Pi
  • Turn on the raspberry Pi (which should have a monitor connected to its HDMI port)
  • After the so-called “rainbow screen”, you should see the message printed by your application (“The magic happens here.”) followed by a kernel crash (which is expected, given that an init process should not terminate normally but reboot or shut down the system instead).
  • Instead, the screen stays blank with a flashing cursor in the upper-right corner.

Notes:

  • This looks like a problem with the application startup code making some “dangerous” assumptions about the state of the system
  • An init process is not necessarily a C program; a bash script can run as init, as long as you have bash available (which is not the case here, because the EXT4 partition is not mounted at application startup time)
  • A compiled Go program can run as init process. It has been done by these guys on a Raspberry Pi 2B and I personally verified it works on a 3B+ too.
  • I’m aware that using the VFAT partition as root makes any assumption about the system even more dangerous.
    Unfortunately, root is the only filesystem mounted at init startup time, and the application cannot be copied on the EXT4 partition from Windows.

Ah yes. I think I know what’s going on. We depend on what linux calls an “interpreter”. On x86_64 we use
/lib64/ld-linux-x86-64.so.2

there’s an equivalent one for arm. The reason we use a loader is because that’s what deals with loading libraries (.so files). Now it looks like it loads it anyway, and ignores the interpreter since it doesn’t exist, or that fails silently.

Golang does system calls directly in their runtime removing the libc dependency all together (at the cost of slightly larger executables). I have to think of a solution on how to best approach this. It might require a custom linking step with libc hard compiled into it and custom linker flags.

You the man, @ck, as always! :sunglasses:

I was able to follow you, er, more or less. I’ll admit to being no Linux hardcore expert, but I can google and/or stackexchange my way through almost anything.

Just two quick thoughts:

  • if it’s a matter of statically vs. dinamically linking libc, that would be a good candidate for a build option;
  • if you tell me which files you need to start up the application properly (and if Linux can find them in the same directory as the program) I can try putting them alongside the application, see what happens and report.

agreed. Just never had a request for it. I’ll log.

$ readelf ConsoleApplication902 -d

Dynamic section at offset 0x9faa8 contains 21 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
0x0000000000000001 (NEEDED) Shared library: [libgcc_s.so.1]
0x0000000000000001 (NEEDED) Shared library: [libpthread.so.0]
0x0000000000000001 (NEEDED) Shared library: [librt.so.1]

  • any dependencies they have themselves.

Here’s some more tests I just did:

  1. Added the files you mentioned, plus their only dependency ld-linux-armhf.so.3.
    Result: Kernel panic - Requested init /ConsoleApplication failed (error -2)
    So maybe Linux wants dynamic libraries to be in some specific location…

  2. Moved said files in a lib directory in the VFAT partition (i.e. /lib, as long as the application’s concerned).
    Result: Kernel panic, stating that libdl.so.2 could not be found.
    Mmmh… getting closer. libdl.so.2 is probably loaded dynamically by the startup code, as it wasn’t mentioned anywhere by readelf -d.

  3. Added libdl.so.2 in the above-mentioned lib directory, after verifying that it doesn’t depend on any further missing library.
    Result: Kernel panic - Attempting to kill init! exitcode=0x0000000b
    Something is still failing. I can’t figure out what nor why, but maybe if I use only rtl methods there will be less uncertainty.

  4. Replaced the contents of Program.cs as follows:

    namespace ConsoleApplication
    {
        static class Program
        {
            public static Int32 Main(string[] args)
            {
                rtl.putchar('T');
                rtl.putchar('E');
                rtl.putchar('S');
                rtl.putchar('T');
                rtl.putchar('\n');
                rtl.sleep(5);
                return 0;
            }
        }
    }
    

    Result: Kernel panic - Attempting to kill init! exitcode=0x0000000b (same as before)

@ck I’m at a loss now, but if you want me to try something else, just ask.

what happens if you NOT make it exit? ie something like:
loop rtl.sleep(5); end;

It seems something exits; but the init process can’t ever exit.

Since there was no “TEST” word printed, no 5 second pause, and the exit code was different from what Main was bound to return, I was already pretty sure the process exited even before entering Main.

Anyway, to be absolutely sure, I replaced Main with:

public static Int32 Main(string[] args)
{
	for (;;)
	{
		rtl.sleep(5);
		rtl.putchar(42);
	}
	return 0; // Elements correctly warns that this is unreachable code
}

Same kernel panic, same message, same exit code 0x0000000b.
Still no 5-second pause, still nothing printed before the kernel panic.

Starting from the stack trace and by perusing this nice syntax-highlighted, cross-referenced kernel source code I tried to track down the problem. Here’s what I found out:

  • the process was terminated because it received a signal with an exit code equal to the signal value (do_group_exit called from get_signal, see here)
  • 0x0000000b = 11 = SIGSEGV (see here)
  • therefore, the process exited because of a segmentation fault. The kernel panic is because it is the init process; otherwise it would just have been terminated and the system would keep working.

Unfortunately, the stack trace takes us to the thread scheduler (do_work_pending), not helping us to identify where the segfault actually happened.

Oke I think I’ve got something. This is going to be a bit of a manual process and you have to run parts on a working raspiberry pi or equivalent hardware (or use a cross linker, but that’s a pain):

 
ld   -\( Island.a  "libgc.a" libgcc.a libgcc_eh.a libpthread.a librt.a libc.a -\) ConsoleApplication907.o ConsoleApplication907.a Island.a --eh-frame-hdr -Bstatic -o ConsoleApplication907

Things you need:
archives.zip (3.2 MB) which has the .a files mentioned above; they’re from a fairly standard raspbian but you can grab your own. It also has a special Island.a which has the compiled (non bitcode) version of the RTL. I used llvm’s tools to generate this.

ConsoleApplication907.a & ConsoleApplication907.o ; those are generated during compiling, usually in a directory like:
C:\Users\me\AppData\Local\RemObjects Software\EBuild\Obj\ConsoleApplication907-D2C85B91A14F7733336A198716A6BB252BE111D2\Debug\Island-Linux\armv6

Note: I couldn’t actually test it running, but this generates a standalone executable without dynamic linking. It runs on rpi/linux but I’m not near my own rpi hardware to try it as an init process.

Obviously I want to make it easier to do it, but I first wanted to know it could work.

I slightly edited your command, changing ConsoleApplication907 to just ConsoleApplication which is my sample project’s name.

  • Cleaned my C:\Users\<myusername>\AppData\Local\RemObjects Software\EBuild\Obj folder
  • Rebuilt the project.
  • Copied files ConsoleApplication.o and ConsoleApplication.a to a folder on my MicroSD card, together with the files extracted from archives.zip, and a link.sh where I put the ld command line copied from your post (with the above-mentioned edit)
  • On the RasPi, copied the folder to my home directory, cded to it, and launched ./link.sh

Here’s what ld tells me:

Island.a(a907b295090da8babc288636fbf2f2b1-Environment.o): In function `ms_t2c_RemObjects_d_Elements_d_System_d_Environmente_UserHomeFolder':
/__windows_drive__c/ci/b/elements/937/source/islandrtl/source/environment.pas:114: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
ConsoleApplication.o: file not recognized: File format not recognized

I tried rebuilding the project again, but ConsoleApplication.o is byte-per-byte identical to the one on the RasPi.

IslandInitTest.zip (753.2 KB) contains both an IslandInitTest directory with the project, and the ConsoleApplication-<lotsofhexdigits> directory with the .o and .a files.

EDIT: Just FYI, here’s the glibc version on my RasPi:

GNU C Library (Debian GLIBC 2.24-11+deb9u3) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.3.0 20170516.
Available extensions:
	crypt add-on version 2.1 by Michael Glad and others
	GNU Libidn by Simon Josefsson
	Native POSIX Threads Library by Ulrich Drepper et al
	BIND-8.2.3-T5B
libc ABIs: UNIQUE
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>. 

Are you building debug or release? I think debug defaults to non bitcode and release to bitcode ( you want non bitcode here)

I was building release. :blush:

Rebuilt debug, redone all steps. Here’s ld output now:

Island.a(a907b295090da8babc288636fbf2f2b1-Environment.o): In function `ms_t2c_RemObjects_d_Elements_d_System_d_Environmente_UserHomeFolder':
/__windows_drive__c/ci/b/elements/937/source/islandrtl/source/environment.pas:114: warning: Using 'getpwuid' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
ConsoleApplication.a(94301f90b512bad9bcda9c5e164664ad-Program.o): In function `__elements_entry_point_main':
/__windows_drive__d/projects/misc/islandinittest/program.cs:9: undefined reference to `sleep'
/__windows_drive__d/projects/misc/islandinittest/program.cs:10: undefined reference to `putchar' 

Just for reference, here’s what’s in the directory where I ran ./link.sh:

total 13772
drwxr-xr-x 2 pi pi    4096 Nov  8 15:26 .
drwxr-xr-x 8 pi pi    4096 Nov  8 15:27 ..
-rw-r--r-- 1 pi pi   40348 Nov  8  2018 ConsoleApplication.a
-rw-r--r-- 1 pi pi     980 Nov  8  2018 ConsoleApplication.o
-rw-r--r-- 1 pi pi 5405008 Nov  8 12:54 Island.a
-rw-r--r-- 1 pi pi 2915158 Jan 14  2018 libc.a
-rw-r--r-- 1 pi pi  255630 Jun 27  2017 libgc.a
-rw-r--r-- 1 pi pi 1262850 Feb 14  2018 libgcc.a
-rw-r--r-- 1 pi pi   18700 Feb 14  2018 libgcc_eh.a
-rw-r--r-- 1 pi pi 4118322 Jan 14  2018 libpthread.a
-rw-r--r-- 1 pi pi   54468 Jan 14  2018 librt.a
-rwxr-xr-x 1 pi pi     176 Nov  8 14:52 link.sh 

and here’s link.sh itself:

ld -\( Island.a  "libgc.a" libgcc.a libgcc_eh.a libpthread.a librt.a libc.a -\) ConsoleApplication.o ConsoleApplication.a Island.a --eh-frame-hdr -Bstatic -o ConsoleApplication

hrmm. that should have been covered by libc. I’ll investigate further; unfortunately that will be next week as I’m off tomorrow.

No problem, I can wait until next week.

BTW, it just occurred to me that so far I had only tried running ConsoleApplication as init process.
So I tried running it from bash, and guess what…

pi@raspberry:~ $ ./ConsoleApplication
Segmentation fault
pi@raspberry:~ $ sudo ./ConsoleApplication
Segmentation fault
pi@raspberry:~ $ 

Maybe the problem is not (or not only) related to linking.

EDIT: The same happens even if I comment out everything in Main, leaving only return 0;

hrmm. Oke that’s not right; the app worked fine for me from the rpi itself (Both statically linked and regular). This is an rpi3 right?

It’s a RPi 3 model B+ with latest Raspbian, latest package updates, plus latest Mono (from Mono project’s repository, not Debian).

Maybe some Elements build from Preview or even Experimental channel could change something? I’m currently on build 2331 (Stable) but I have no problem testing later builds if you say so.

Can you try .2343 from yesterday?

Just tried. Elements with Water 10.0.0.2343; Raspberry Pi 3 model B+ with latest Raspbian Lite (not even updated packages). Rebuilt in Debug, copied executable to home directory on RPi, invoked executable from console (local, not SSH).

Alas, same result. Segmentation fault.

Rebuilt in Release, copied etc.etc. Same result.

Exchanged the RPi for an"old" model B, just to exclude a defective board or RAM.
Same result (unless both boards are defective, of course).

That of course is fairly unlikely. I’m going to try and dig deeper into this this week. Still odd it worked for me and not you though.

I completely concur @ck, that’s why I also tried with a “clean” Raspbian Lite image, and on another RPi.

Of course I can send you a zipped image of my MicroSD card if you think it may be of help.

If I may advance a further suggestion, what about building startup code with some good old putchar calls in “strategic” places, just to try and see between which of them the segfault occurs?

(I know it sucks in at least eleven ways as a debugging method, but it has also saved my day, not to mention my posterior, more times than I’d like to admit in my 29 years in the business).

1 Like

oh I know writeln debugging :slight_smile: That’s how I bootstrap most new targets (like webassembly, Darwin, etc). The first version never has a debugger.