Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vexriscv SaxonSoc Linux Version #1

Closed
lawrie opened this issue Sep 15, 2020 · 28 comments
Closed

Vexriscv SaxonSoc Linux Version #1

lawrie opened this issue Sep 15, 2020 · 28 comments

Comments

@lawrie
Copy link

lawrie commented Sep 15, 2020

I don't know if you are interested but Paul Ruiz (https://gitlab.com/pnru) and myself have got a version of this project working with SaxonSoc Linux in this fork - https://github.com/lawrie/riscv32_lcc

SaxonSoc is a more recent SoC written by @Dolu1990, the author of SpinalHDL, VexRiscv and the Murax SoC, which you use.

We have it running on the hardware of the Ulx3s ECP5 FPGA board, - see recent discussion on https://gitter.im/ulx3s/Lobby.

To make it run on SaxonSoc Linux, ELF output has been added to ld, and risc-v assembler added for crt0.s and syscall.s (replacing syscall.c). A few other changes/fixes have been applied (including a fix to dof to print object files correctly).

It is still work in progress, particularly in the area or the runtime library and system calls.

@michg
Copy link
Owner

michg commented Sep 16, 2020

Thanks for you links, I was not aware of them. Great to hear, that the backend is used and more, it is worked on it.
The ELF output is very nice and I think it is a good idea to integrate your fixes also in the main repository.
Regards,
Michael

@pnrhub
Copy link

pnrhub commented Sep 16, 2020

Hi Michael,
I'd like to understand better what the intended stack frame structure is. The background is that when debugging the argc and argv arguments to main(), I got the impression that perhaps there was a latent bug in the stack frame layout (functions appear to write in the frame of their caller). I realize that you wrote this stuff 4 years ago, but maybe you have some notes or recollection of what the intended layout is.
Paul

@michg
Copy link
Owner

michg commented Sep 16, 2020

Hi Paul,
the relevant code is in function() in riscv32.md. Also a good thing is to look into the generated .s of the testcases/picorv32,
for example have a look into vararg.s.
Michael

@pnrhub
Copy link

pnrhub commented Sep 16, 2020

I was looking at function() quite a bit the other day. I think there may be some double counting in the stack offsets. Take the following C code:

main(int argc, char *argv[])
{
	dummy();
}

This translates to:

	.align 4
	.text
	.globl main
	.align	4
main:
	addi x2,x2,-48
	sw  x8,44(x2)
	addi  x8,x2,32
	sw x1,24(x2)
	sw x12,16(x8)
	sw x13,20(x8)
	jal x1,dummy
	li x0,0
	addi x10,x0,0
L.1:
	lw x1,24(x2)
	lw  x8,44(x2)
	addi  x2,x2,48
	jalr x0,x1,0

x12 and x13 appear to be saved in the caller's stack frame. Is that what was intended?

I would have expected something like:

old x2  -> +--------------+
           |  saved x8    |
           +--------------+
           |              |
           |              |
           |  saved       |
           |   arguments  |
           |              |
           |              |
           |              |
new x8  -> +--------------+
           | saved x1     |
           +--------------+
           |              |
           |              |
           |  local       |
           |   variables  |
           |              |
           +--------------+
           |              |
           |              |
           |  saved       |
           |   registers  |
           |              |
new x2  -> +--------------+

but it would seem that something different is happening.

@michg
Copy link
Owner

michg commented Sep 17, 2020

Hi Paul,
the layout for parameters passed on the stack is different. They are passed on top of new x2.
If params are passed in registers they are duplicated, and copied in the caller stack frame. Probably this is
the thing you have wondered about.

@lawrie
Copy link
Author

lawrie commented Sep 17, 2020

Hi Michael, there is discussion about the progress of the SaxonSoc Linux port on the Ulx3s gitter, if you would like to join in at any time - https://gitter.im/ulx3s/Lobby.

Currently I am trying to support a lot of Linux system calls.

@pnrhub
Copy link

pnrhub commented Sep 17, 2020

OK, my new understanding is as follows:

           |              |
           |              |
           |  call        |
           |   arguments  |
           | - - - - - - -|
           |              | first 6 words are in x12-x17, but
           |              | space reserved for callee saving
           |              |
old x2  -> +--------------+------------------- <- frame boundary
           |  saved x8    |
           | - - - - - - -|
           |              |
           |  local       |
           |   variables  |
           |              |
           +--------------+
           |  0..3 words  | (x8 pointer is always multiple
           |   padding    |  of 16 bytes = 4 words)
new x8 ->  +--------------+           
           |  0..3 words  | (stack frame is always multiple
           |   padding    |  of 16 bytes = 4 words)
           +--------------+
           |              |
           |  callee      | (includes saving x1)
           |  saved regs  |
           |              |
           +--------------+
           |              |
           |  call        | same as above
           |  argument    | size enough for most
           |  space       | complex call made
           |              |
           |              |
new x2  -> +--------------+----------------- <- frame boundary

Notes:
- difference between old x2 and new x8 is called "framesize" in function()
- difference between old x2 and new x2 is called "framesizeabs" in function()

** Figure has been edited in view of below discussion **

If correct, why is x8 (the frame pointer) not saved with the other registers (like x1)?

@michg
Copy link
Owner

michg commented Sep 17, 2020

The local variables are located above new x8. The x8 location is just caused
by the usual saving order.

@lawrie
Copy link
Author

lawrie commented Sep 17, 2020

Hi Michael,

Paul may know better, but I think the changes that you might want to integrate into the main repository are part of this commit - lawrie@1b5d2e3.

There is a one-line change to dof.c which allows the relocation section to be printed.

The two-line change to as.c supports ecall.

The changes to riscv32.md were experimental and have been reverted.

I think the changes to ld.c for elf output are all driven by the -e flag, so you might want to take those.

The changes to linux.c are partly specific to the layout we are using on SaxonSoc Linux (LCCDIR) and partly to produce elf output, but they do enable us to compile and link things with lcc <files> and no other flags.

The rest of the changes we have made are to the C RTL, and I think that is likely to diverge quite a bit from your version as it uses Linux system calls.

There is a further change to ld.c in this commit, but it might be linux-specific - 47c084c

@pnrhub
Copy link

pnrhub commented Sep 17, 2020

Hi Michael,

Thanks - updated the figure and I think I have it correct now. I am uncertain as why x8 is pointing so low in the stack frame, the LCC front end seems to assume it being at or around "old x2" (args have positive offsets, locals negative). For a moment I thought that RiscV may not support negative displacements, but that is not true.

Further to Lawrie's summary above:

  • I don't think you integrated your ld with the lcc controller - because the loader the parameter list seemed to be for the traditional Unix a.out loaders. We've hacked etc/linux.c to work with your loader. Still, it specified the -h option; maybe you also worked with another loader?

  • My reading of ld is that it always does a final link and strips the a.out file. There does not seem to be a -r type option. That by itself made adding ELF support easy: it is hardly more difficult than writing an a.out header (as you will see in the code). The ELF header could be better by having two Pgm header blocks: one for the CODE section and one for the DATA/BSS section.

  • We placed all libraries in one big library: c.lib

  • I badly hacked ld to make it remove the extracted module files. In order to keep track of what was already loaded, I reused the symbol tree code to keep a list/tree with module names. Very ugly, but expedient. I would not take this up as a patch. Moreover, I think leaving the library modules extracted was also part of the test suite, right?

Also some general remarks:

  • When compiling with -g, the lcc controller does not pass the -g flag to the loader (on a traditional Unix ld this is not necessary). I have not fixed this, nor have I tried to place the stabs information in the ELF output. At the moment we do not have a debugger on SaxonSoc, so the stabs info would not have a use anyway.

  • I noticed how you write out the stabs info as json and then use a python VCD parser to do debugging from an Icarus Verilog test run. Very, very cool. I hope to reuse that idea in some of my other projects.

What were your plans with this project when you wrote it? Any current plans?

@michg
Copy link
Owner

michg commented Sep 18, 2020

Hi Paul, hi Lawrie,
the locals all have positive offset from x8. This makes it possible to use c.lw and c.sw in RVC.
I will try to have a look at your changes and then integrate them in to the main-repository.
Regarding plans: the current debug information implementation is not sufficient to use it on larger code base
and it is also based on stabs which should to be reworked. In the future I would like to emit dwarf
in the generated .s files instead of stabs.

@pnrhub
Copy link

pnrhub commented Sep 18, 2020

Ah, had not considered the compressed instruction formats and that these indeed only support positive offsets. Now the placement of the x8 frame pointer makes sense.

What is the rationale for setting the minimum size of the argument area at 4 when there are no function calls made and at 24 when there are?
https://gitlab.com/pnru/ulx3s-misc/-/blob/master/tmp/lcc-riscv/riscv32.md.c#L631

@michg
Copy link
Owner

michg commented Sep 18, 2020

Hi Paul,
24 = 6*4 (first 6 words are in x12-x17), that's the writing space, you have wondered about.
Doing it this way, makes the variadic case a lot of easier.

@pnrhub
Copy link

pnrhub commented Sep 18, 2020

Thanks!

And the rationale for 4 when there are no function calls made? Is something else stored there?

And a nitpick: in ld codeName, dataName and debName are defined as static arrays, not pointers. The if statements e.g. here assumes they are pointers:
https://github.com/michg/riscv32_lcc/blob/master/binutils/ld/ld.c#L214

@michg
Copy link
Owner

michg commented Sep 18, 2020

I am not aware that the 4 bytes are used, I think it's more an alignment to word size.
Thanks, I agree, comparing codeName, dataName and debName with NULL is not correct.

@pnrhub
Copy link

pnrhub commented Sep 18, 2020

Sorry to keep posting, but another question (and maybe bug report):

What if the original maxargoffset is 20? Then it will not get increased to 24 and that may interfere with the variadic code? I realize that the frame will get scaled up to the next multiple of 16, but the register save block locations would still overlap with the minimum argument space that the variadic code expects.
https://gitlab.com/pnru/ulx3s-misc/-/blob/master/tmp/lcc-riscv/riscv32.md.c#L632

(we seem to have a hard to diagnose bug with printf)

@michg
Copy link
Owner

michg commented Sep 18, 2020

You maybe right, I am not aware of an reason, why the comparison is done with "<16".
<24 would make more sense. Are you able to reproduce the bug? Maybe you can
repeat your test with a value of 24?

@pnrhub
Copy link

pnrhub commented Sep 18, 2020

It may be a left over from the MIPS code, which has "if (<16) set to 16" in that spot.

I was confused over the minimum 4 size, it is rounded up to 4, i.e. just an alignment thing.

I can confirm that calling a variadic function with 4 or 5 arguments (e.g. printf with 3 or 4 arguments after the format string) causes a crash, also in simulation. Changing the comparison to <24 fixes this.

The printf issue we had on SaxonSoc seems non-repeatable, which makes it hard to diagnose - it may be different from this bug. I will let you know once I know more.

@pnrhub
Copy link

pnrhub commented Sep 21, 2020

Hi Michael,

The argument fix seems to have cleared the compiler issues. Still a lot of work in the c library and making it work with RiscV32 Linux. It seems derived from a BSD library, but I cannot quite find the version you used as a base. It seems to sit between 4.3 Reno and 4.4. What base did you use?

@michg
Copy link
Owner

michg commented Sep 22, 2020

Hi Paul,
that's great, that the change resolved the issue. By the way, did you test also with the +4?:
framesizeabs = roundup(maxargoffset + sizeisave + sizefsave + 4, 16) + framesize;
Without +4, I had another failing testcase.
The clib was derived from a another mips lcc port:
https://github.com/martynvandijke/Computation-2/tree/d8b8a55f4433c464a0d176f1a996472c65d2e09a/Work_Folder_Mips/mMips/Old%20Stuff/mmips/lcc/libc

@pnrhub
Copy link

pnrhub commented Sep 25, 2020

Yes we have that.

Currently lcc is working fine on SaxonSoc for small experimental programs. Most of the current effort is rounding out the C library with functions and system calls as and when the need comes up. As an experiment, we let lcc compile its own assembler and that worked out OK. The resulting binary is even a bit smaller than the output of gcc-riscv.

I had a quick look at letting lcc compile itself, but was surprised to see that it had compile errors that at first glance were not the result of missing or incompatible include files. Did you try to let lcc-riscv compile its own sources?

@michg
Copy link
Owner

michg commented Sep 26, 2020

Hi Paul,
I didn't try yet to compile lcc itself. The binutils as and ld should be possible until the latest change. With the new
mlib dependency this will probably not work any more, M*LIB requires C99 /C11.

@pnrhub
Copy link

pnrhub commented Sep 26, 2020

Looking at self-hosting bit by bit. lburg and cpp appear to compile & work. ucpp appears to need work to compile with lcc-riscv. What was the reason for using ucpp instead of the lcc provided cpp?

Also wondering about the following. Compiling main() { return 0; }. This produces:

	.globl main
	.align	4
main:
	addi x2,x2,-16
	sw  x8,12(x2)
	addi  x8,x2,0
	li x0,0
	addi x10,x0,0
L.1:
	lw  x8,12(x2)
	addi  x2,x2,16
	jalr x0,x1,0

whereas compiling main() { return 3; } produces:

	.globl main
	.align	4
main:
	addi x2,x2,-16
	sw  x8,12(x2)
	addi  x8,x2,0
	li x10,3
L.1:
	lw  x8,12(x2)
	addi  x2,x2,16
	jalr x0,x1,0

It would seem to me that the code in the target() function in the backend that tries to optimize register x0 as a fixed zero when loading constants is a holdover from MIPS and not really working for the RiscV32 rules, making the generated code worse instead of better. What do you think?

@michg
Copy link
Owner

michg commented Sep 27, 2020

Hi Paul,
the original cpp has some issues: drh/lcc#30. On larger code you will probably hit them.
Regarding li x0,0: CNST are used in a "chain rule", this might be the reason that target optimization does not work.

@michg
Copy link
Owner

michg commented Sep 27, 2020

The chainrule was not the reason, please have a look at the latest change.

@pnrhub
Copy link

pnrhub commented Sep 27, 2020

Yeah, that makes sense. Thanks!

I haven't worked with lcc in 20+ years (present work excluded) and never with version 4.x, but I read the patch as:
if a register holds the constant 0, special case it to emit2, which emits nothing for this case. The target() code already special cases the constant 0 to always be allocated to register x0.

@michg
Copy link
Owner

michg commented Sep 27, 2020

Correct, this is how the mips port does it, and it also works for riscv32..

@michg
Copy link
Owner

michg commented Oct 4, 2020

Thanks Paul and Lawrie. Feel free to reopen it for further discussion.

@michg michg closed this as completed Oct 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants