Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configuration read doesn't work, returns 0x00 or 0xFF #2

Open
ecm-pushbx opened this issue Nov 30, 2021 · 22 comments
Open

Configuration read doesn't work, returns 0x00 or 0xFF #2

ecm-pushbx opened this issue Nov 30, 2021 · 22 comments

Comments

@ecm-pushbx
Copy link

I managed to add dualflexpress.v to my branch of the CPU86 project (a free 8088 CPU implementation design), running on the MAX1000 board. I wrote a little glue logic to map an MMIO interface for the flash controller. Reading from the flash controller works with that now.

Configuration writes also appear to work -- at least, after calling take_offline and then place_online in my check.asm then the read tests work again (same as right after programming the FPGA). Those are c86boot.asm -D_TESTLOAD=1 and test6.asm -D_AMOUNT=20 -D_ADDRESS=0 -D_STRIDE=1.

Configuration reads however seem to return either 0x00 or 0xFF, not the data they should be reading. With the current check.asm file I get "00000000" for the flash ID and twice "00" then 14 times "FF" for the read test. (Same results with the QSPI controller instead, which I replaced by the DSPI controller.) (Tried with check.asm's define -D_EMPTY_BEFORE_READ from 0 to 10.)

Did I get something wrong?

@ecm-pushbx
Copy link
Author

Had an error in the flashid handler that I fixed now, now ID reads as "00000000" and all sixteen test_read bytes read as "FF".

@ZipCPU
Copy link
Owner

ZipCPU commented Nov 30, 2021

It would help if you followed the right interface specification. The DSPI core follows the Wishbone B4 pipeline standard, not Wishbone B3 or Wishbone "classic". (Wishbone B4 is faster than classic, and can support a higher throughput ...) This isn't Wishbone B4 pipeline logic.

Specific things to note:

  1. STB should be left high until STB && !STALL. Your logic doesn't examine the STALL signal at all. This means that your one request could be turning into multiple requests within the core.
  2. When ACK is high, the return DATA is valid--not a cycle after. If you are getting good reads by waiting a cycle after ACK, then your configuration is still broken. Reads are not working in that case.
  3. I'm not certain what your "flashduration" register is doing, but no such register should be required. CYC & STB should be set on any request, together with the appropriate WE, ADDR, and write DATA and SEL (if writing). These should hold constant until the first cycle STB && !STALL are true. On the next cycle, either STB should be low, or the next request should be present on WE, ADDR, and possible DATA and SEL. When ACK is true, any read data (!WE) will be present in the return DATA wires. Once you have received one ACK for every STB & !STALL, you can then drop CYC and end the bus cycle. No duration counter is required.

To go further, though, I'd need to ask you to provide a trace of the flash interaction, showing how things are or are not working. Can you do that for me?

Dan

@ecm-pushbx
Copy link
Author

I don't have time to work on this right now, but I appreciate your pointers. As you can tell I'm still very new to all of this.

I followed the description in this document, wbspec_b4.pdf, page 41 "Classic pipelined SINGLE READ Cycle" which lists "CLOCK EDGE 0", "... 1", and "... 2". I'm not sure what is meant in that document by "latches data" to be quite honest. I'll defer to your knowledge and fix this when I have the time. Perhaps page 86 or page 98 is closer to what I need? Page 41 does say "classic" so that seems like it may mean non-pipelined.

The flashduration register is just a debugging aid to make sure that the flash controller is at least somewhat operational / addressed correctly. To use it I would manually use the monitor (or later, debugger) to send a command to the MMIO interface then check the results including my duration register.

I don't know how to provide a trace of the interaction. Could you tell me more about how to do that? Do I need another hardware device, or just to modify my design or set up a new design with something else?

@ecm-pushbx
Copy link
Author

I read through most of the Building a Simple Wishbone Master post now. I tried adapting its logic to my mmioflash interface. However, it returns "17000000" for the ID and all "FF" for the data now. The post contains this line:

There’s no means for sending multiple write commands without dropping CYC between them. This will break our flash controller, so we’ll have to come back and fix this.

Is this possibly related to my problems?

Here's my current architecture:

architecture rtl of mmioflash is

signal flashdata : std_logic_vector(31 downto 0);
signal flashaddress : std_logic_vector(31 downto 0);
signal flashcontrol : std_logic_vector(7 downto 0);
signal flashduration : std_logic_vector(7 downto 0);
signal priorclockmem : std_logic;
signal internal_wb_stb : std_logic;
signal internal_cfg_stb : std_logic;
signal internal_wb_cyc : std_logic;

begin

	wb_stb <= internal_wb_stb;
	cfg_stb <= internal_cfg_stb;
	wb_cyc <= internal_wb_cyc;

 process(abus, clock, clockmem, wren, reset_n, flashcontrol, wb_ack, o_wb_data)
  begin
		if (reset_n = '0') then
			flashdata <= X"00000000";
			flashaddress <= X"00000000";
			flashcontrol <= X"00";
			flashduration <= X"00";
			internal_wb_stb <= '0';
			internal_cfg_stb <= '0';
			internal_wb_cyc <= '0';
			i_wb_data <= X"00000000";
			priorclockmem <= '0';
		elsif rising_edge(clock) then
			if (internal_wb_stb = '1' or internal_cfg_stb = '1') then
				if (wb_stall = '0') then
					internal_wb_stb <= '0';
					internal_cfg_stb <= '0';
				else
					flashduration <= std_logic_vector(unsigned(flashduration) + 1);
				end if;
				if (wb_stall = '0' and wb_ack = '1') then
					if (flashcontrol(6) = '0') then
						flashdata <= o_wb_data;
					end if;
					internal_wb_cyc <= '0';
					flashcontrol(5) <= '0';
					flashcontrol(6) <= '0';
					flashcontrol(7) <= '0';
				end if;
			elsif (internal_wb_cyc = '1') then
				if (wb_ack = '1') then
					if (flashcontrol(6) = '0') then
						flashdata <= o_wb_data;
					end if;
					internal_wb_cyc <= '0';
					flashcontrol(5) <= '0';
					flashcontrol(6) <= '0';
					flashcontrol(7) <= '0';
				end if;
			end if;
		if (clockmem = '0') then
			priorclockmem <= '0';
		end if;
		if (clockmem = '1' and priorclockmem = '0') then
			priorclockmem <= '1';
			if (wren='0') then
				case abus is
				 when "0000"  => dbus <= flashdata(7 downto 0);
             when "0001"  => dbus <= flashdata(15 downto 8);
             when "0010"  => dbus <= flashdata(23 downto 16);
             when "0011"  => dbus <= flashdata(31 downto 24);
             when "0100"  => dbus <= flashaddress(7 downto 0);
             when "0101"  => dbus <= flashaddress(15 downto 8);
             when "0110"  => dbus <= flashaddress(23 downto 16);
             when "0111"  => dbus <= flashaddress(31 downto 24);
             when "1000"  => dbus <= X"00";
             when "1001"  => dbus <= X"00";
             when "1010"  => dbus <= X"00";
             when "1011"  => dbus <= X"00";
             when "1100"  => dbus <= flashcontrol(7 downto 0);
             when "1101"  => dbus <= flashduration(7 downto 0);
             when "1110"  => dbus <= X"00";
             when "1111"  => dbus <= X"00";
             when others    => dbus <= "--------";
				end case;
			else -- wren='1'
				if (abus="1100") then
					if (data(0) = '1') then
						wb_addr <= flashaddress(21 downto 0);
						wb_we <= '0';
						internal_wb_cyc <= '1';
						internal_wb_stb <= '1';
						internal_cfg_stb <= '0';
						flashcontrol(7) <= '1';
						flashcontrol(6) <= '0';
						flashduration <= X"00";
					elsif (data(2) = '1') then
						wb_addr <= (others => '0');
						wb_we <= '0';
						internal_wb_cyc <= '1';
						internal_wb_stb <= '0';
						internal_cfg_stb <= '1';
						flashcontrol(7) <= '1';
						flashcontrol(6) <= '0';
						flashduration <= X"00";
					elsif (data(3) = '1') then
						wb_addr <= (others => '0');
						i_wb_data <= flashdata;
						wb_we <= '1';
						internal_wb_cyc <= '1';
						internal_wb_stb <= '0';
						internal_cfg_stb <= '1';
						flashcontrol(7) <= '1';
						flashcontrol(6) <= '1';
						flashduration <= X"00";
					end if;
				else
				case abus is
				 when "0000"  => flashdata(7 downto 0) <= data;
             when "0001"  => flashdata(15 downto 8) <= data;
             when "0010"  => flashdata(23 downto 16) <= data;
             when "0011"  => flashdata(31 downto 24) <= data;
             when "0100"  => flashaddress(7 downto 0) <= data;
             when "0101"  => flashaddress(15 downto 8) <= data;
             when "0110"  => flashaddress(23 downto 16) <= data;
             when "0111"  => flashaddress(31 downto 24) <= data;
             when "1000"  => null;
             when "1001"  => null;
             when "1010"  => null;
             when "1011"  => null;
             when "1100"  => null;
             -- when "1100"  => flashcontrol(7 downto 0) <= data;
             when "1101"  => null;
             when "1110"  => null;
             when "1111"  => null;
             when others    => null;
				end case;
				end if;
			end if;
			end if;
		end if;
end process;
end rtl;

@ZipCPU
Copy link
Owner

ZipCPU commented Dec 6, 2021

The flash controller issue has been fixed with the introduction of a configuration port.

In my older flash controller, you could write to the flash by simply issuing write commands via Wishbone. This would drop the CSN line and issue a write request. You could write up to 256 bytes (one page) at a time per the flash spec, and then the flash would go into never never land while writing this data. At issue was knowing when to raise the CSN line and drop the write request. I used the CYC line to do that: once CYC went low, I'd raise the CSN line and thus end the write request.

In the newer flash controller, such as the dualflexpress controller you are using, there's a separate config port. Using this config port, you can control the CSN on a beat by beat basis. As a result, any proper Wishbone interaction will work--with the limitation that you cannot read from the flash while the config port is active. Boiling that down further, multiple masters may interact with the flash all they want in whatever order up until the config port. When using the config port, only one master and/or one software thread may interact with the flash until any writes/erases/etc are complete.

As for the B4 standard you referenced, yes, illustration 5-4 on page 86 for pipelined reads is an appropriate reference. Beware, though, that flash reads can take a long time. Hence, you may find a lot of clocks taking place between STB && !STALL and ACK. (For the read port, this should work out to about 36 clocks or so--assuming OPT_CLKDIV == 0)

Your logic above (posted above) is also looking much better.

Here's what a trace might look like:

dualflexpress

In this trace: 1) it starts early on with a reset. It's then followed by CYC && STB && !STALL. Once this happens, CS_n drops, and the SCK pin becomes active. Many cycles of interaction then take place over the DSPI port until ACK becomes active. At that point, the DSPI controller accepts a second request, which is then immediately started once CSN is released. You can also read from the trace the correct answer coming from the SPI port: 0x04444401. Such a trace will help you and I understand whether your problem is coming from the interaction between the controller and your CPU, or between your interaction between the controller and the flash.

Test #1 should be whether or not you can read the manufacturers ID from the flash. This is an important test, because it will tell you if you have the timing right or whether or not you've misconfigured the hardware set up--and if so by how many clocks the set up has been misconfigured by. You can read the software I use to capture this flash ID here. According to the WinBond PDF, the manufactuer ID byte should be 0xEF, which would be followed by two device ID bytes. Those last two bytes should look something like 0x4017, but I can't quite be certain since the Winbond datasheet is a generic one rather than the specific one for this device.

Once you can receive this proper ID from the flash, your next step should be to send the restore_dualio() sequence. If done properly, you should then be able to read from the device. In this case, again, it will help to know what values are on your device to properly verify that they are being read properly. A device that has never been written to will likely be all 0xff's. With a programmed Xilinx device, the flash will begin with a synchronization sequence--although I'm not sure what that sequence would be with Intel device.

Dan

@ecm-pushbx
Copy link
Author

I committed my new logic as quoted above in the repo now. However, still no luck in making the config port work. The flash ID is read as "17000000" and all data reads are "FF". Additionally, executing check.asm, the calls to the dataread function (using the MMIO command 01h) work at first. However, the controller does not successfully return to read mode after the further tests (using the config port, on MMIO commands 04h and 08h). The test6.asm or check.asm dataread tests (both using only MMIO command 01h) only work initially, after programming the FPGA or after resetting it using the user button. The data read tests do not work any longer after trying the flashid function (ported from your driver, returns "17000000") and the test_read function (which tries to read the flash using an "F_READ" command (03h) using the MMIO command 04h (config read) and 08h (config write), and returns all "FF" for its data).

In the hopes that it might make a difference I also added your dualflexpress.v from the arrowzip repo (which also required adding wbarbiter.v, and needed the localparam replacement plus setting CLKDIV=1, RDDELAY=0, NDUMMY=4), and adjusted my MMIO and top modules. The result is very similar: reading from the controller initially works, my own test_read function returns all FF, and after running check.asm once, the controller gets stuck in a state where reading from the controller always returns all zeros (and flash ID = "FFFFFFFF" and reading all "FF" again from test_read). The only difference is that the first flash ID function call results in "F100C182" instead of "17000000".

@ecm-pushbx
Copy link
Author

Just tried running only the flashid function (with its take_offline and place_online calls), without test_read. This also 1. returns "F100C182" as the ID, 2. leaves the controller in a state where reading (MMIO command 01h) returns all zeros, and 3. the next flashid function call returns "FFFFFFFF".

@ecm-pushbx
Copy link
Author

The flash controller issue has been fixed with the introduction of a configuration port.

In my older flash controller, you could write to the flash by simply issuing write commands via Wishbone. This would drop the CSN line and issue a write request. You could write up to 256 bytes (one page) at a time per the flash spec, and then the flash would go into never never land while writing this data. At issue was knowing when to raise the CSN line and drop the write request. I used the CYC line to do that: once CYC went low, I'd raise the CSN line and thus end the write request.

In the newer flash controller, such as the dualflexpress controller you are using, there's a separate config port. Using this config port, you can control the CSN on a beat by beat basis. As a result, any proper Wishbone interaction will work--with the limitation that you cannot read from the flash while the config port is active. Boiling that down further, multiple masters may interact with the flash all they want in whatever order up until the config port. When using the config port, only one master and/or one software thread may interact with the flash until any writes/erases/etc are complete.

Good, so that is not the cause of my problems.

As for the B4 standard you referenced, yes, illustration 5-4 on page 86 for pipelined reads is an appropriate reference. Beware, though, that flash reads can take a long time. Hence, you may find a lot of clocks taking place between STB && !STALL and ACK. (For the read port, this should work out to about 36 clocks or so--assuming OPT_CLKDIV == 0)

I'm using CLKDIV = 1 but anyway I don't particularly need very high performance.

Your logic above (posted above) is also looking much better.

Here's what a trace might look like:

dualflexpress

In this trace: 1) it starts early on with a reset. It's then followed by CYC && STB && !STALL. Once this happens, CS_n drops, and the SCK pin becomes active. Many cycles of interaction then take place over the DSPI port until ACK becomes active. At that point, the DSPI controller accepts a second request, which is then immediately started once CSN is released. You can also read from the trace the correct answer coming from the SPI port: 0x04444401. Such a trace will help you and I understand whether your problem is coming from the interaction between the controller and your CPU, or between your interaction between the controller and the flash.

What tools do you use to create a trace like that?

Test #1 should be whether or not you can read the manufacturers ID from the flash. This is an important test, because it will tell you if you have the timing right or whether or not you've misconfigured the hardware set up--and if so by how many clocks the set up has been misconfigured by. You can read the software I use to capture this flash ID here. According to the WinBond PDF, the manufactuer ID byte should be 0xEF, which would be followed by two device ID bytes. Those last two bytes should look something like 0x4017, but I can't quite be certain since the Winbond datasheet is a generic one rather than the specific one for this device.

I did port the flashid function from your driver. (But from the qspiflash repo, not arrowzip - I think that particular function doesn't differ between them though.) However, as I listed, the ID always is returned as "17000000" at first (qspiflash repo's dualflexpress) or "F100C182" at first (arrowzip repo's dualflexpress), or "FFFFFFFF" later. Is the "17" in the older dualflexpress's ID result possibly a corrupted / partial byte of the correct ID?

Once you can receive this proper ID from the flash, your next step should be to send the restore_dualio() sequence. If done properly, you should then be able to read from the device. In this case, again, it will help to know what values are on your device to properly verify that they are being read properly. A device that has never been written to will likely be all 0xff's. With a programmed Xilinx device, the flash will begin with a synchronization sequence--although I'm not sure what that sequence would be with Intel device.

I can read from the flash fine as long as I do not use the config port at all. Here's the output from two runs of check.asm -D_TESTREAD=0. The data read at first is correct (4D 5A etc - this is the MZ header of an 86-DOS executable / kernel image I wrote to the beginning of the flash). Oh, I just noticed that your more recent dualflexpress (from the arrowzip repo) does not provide the OPT_ENDIANSWAP option. I'll return to the qspiflash repo's dualflexpress.

MON88 8088/8086 Monitor ver 0.12
Copyright WWW.HT-LAB.COM 2005-2008
Modified by C. Masloch, 2021
All rights reserved.

 md>
Cmd>l 
Start upload now, load is terminated by :00000001FF
>.....................
00000000=4D5AEB16
00000001=C3000000
00000002=F600E006
00000003=E006941D
00000004=00080000
00000005=4001F0FF
00000006=0000FAFC
00000007=5245E800
00000008=005981F9
00000009=21017503
F100C182

Program terminated with int19.

MON88 8088/8086 Monitor ver 0.12
Copyright WWW.HT-LAB.COM 2005-2008
Modified by C. Masloch, 2021
All rights reserved.

 md>
Cmd>l 
Start upload now, load is terminated by :00000001FF
>.....................
00000000=00000000
00000001=00000000
00000002=00000000
00000003=00000000
00000004=00000000
00000005=00000000
00000006=00000000
00000007=00000000
00000008=00000000
00000009=00000000
FFFFFFFF

@ecm-pushbx
Copy link
Author

This is the output from running check.asm -D_TESTREAD=0 twice, using the qspiflash repo's dualflexpress with CLKDIV=1, RDDELAY=0, NDUMMY=4, ENDIANSWAP=1 (running this commit). As you can see it returns "17000000" for the ID repeatedly. Data read from flash is fine before using the config port. Afterwards it is all zeros. 4D, 5A, EB, 16 is the correct first 4 bytes read from the start of the flash.

MON88 8088/8086 Monitor ver 0.12
Copyright WWW.HT-LAB.COM 2005-2008
Modified by C. Masloch, 2021
All rights reserved.

 md>
Cmd>l 
Start upload now, load is terminated by :00000001FF
>.....................
00000000=16EB5A4D
00000001=000000C3
00000002=06E000F6
00000003=1D9406E0
00000004=00000800
00000005=FFF00140
00000006=FCFA0000
00000007=00E84552
00000008=F9815900
00000009=03750121
17000000

Program terminated with int19.

MON88 8088/8086 Monitor ver 0.12
Copyright WWW.HT-LAB.COM 2005-2008
Modified by C. Masloch, 2021
All rights reserved.

 md>
Cmd>l 
Start upload now, load is terminated by :00000001FF
>.....................
00000000=00000000
00000001=00000000
00000002=00000000
00000003=00000000
00000004=00000000
00000005=00000000
00000006=00000000
00000007=00000000
00000008=00000000
00000009=00000000
17000000

@ZipCPU
Copy link
Owner

ZipCPU commented Dec 8, 2021

Sigh. Without a trace to work from, you're really making this a challenge. What tool can be used to provide a trace? Many.

  • Most vendor packages (a.k.a. Quartus) will provide a simulation capability that can generate a trace in simulation.
  • Most vendor packages will provide an internal logic analyzer tool that you can use to read bits back from within the design and so generate a trace. With Intel, this is called SignalTap. (I've never used it ... but it should do the trick.) The thing to beware of is that SignalTap is a piece of logic you have to place onto your FPGA--so it uses up some of your area, while also collecting trace information from within your design.
  • I typically do my simulations using Verilator. I can get traces that way as well. The problem for you, though, is that Verilator is a Verilog only tool--so that's not likely to work for you.
  • When a design is running, I use wbscope to pull a trace from a design. The software that comes with it can generate a VCD trace for you. There's also a compressed version of the scope in the same repo which I would use for things like this, where you might be running for many cycles without wanting or needing to know what the various captured bits are. (i.e., when you aren't commanding anything.)
  • The trace above, however, was generated using the formal verification tool SymbiYosys.

You may notice that all of my designs have one or more WBSCOPEs in them. It's just a fact of life.

Once you can extract a trace from your design, while running it on the Max1000, showing what's going on right and wrong from within the design--that's when you'll be cooking with gas. You'll find debugging goes a lot faster once you can "see" what's going on within your FPGA.

Now, onto some of the things you've sent. I've examined your assembly file, and I do find a problem--though perhaps not a relevant one. CFG_WEDIR should not be used for the dummy cycles. CFG_WEDIR tells the Dual SPI controller to drive both MOSI and MISO wires. The dummy cycles, however, are used to change pin direction--from the master driving those wires to the slave driving them. Therefore, CFG_WEDIR should be low for these cycles.

Also, let me ask, is there a possibility that you have the byte order wrong when sending the config bytes? That is, are bits [15:8] getting mixed up with bits [7:0] of the i_wb_data word? Or for that matter that you are even setting i_wb_data word correctly at all?

Now to explain some things:

The DSPI controller depends upon the flash being in a special state. I think it's called Dual I/O XIP or some such. When the flash is in this state, the controller can handle a MMIO request by sending 6 bytes of address, 1 mode byte, and some dummy clocks before being able to read data coming back from the flash at the requested address. Note that it doesn't send a command byte. This is normal operation for the controller. I use it because, for a 2-bit data bus, this is the fastest you can go from idle to data response.

If the controller isn't in DIO XIP mode when you try to read from the flash, then you've got some problems. First, it's going to interpret your first two address bytes as a command byte. Not only that, but the MISO pin will be driven by both the flash and the FPGA. (Not good.) Bottom line: the flash won't behave. It doesn't matter how many times you try to read from the MMIO in this mode, the flash won't behave. This is normal. It just means you need to put the flash back into the DIO XIP mode. This is also consistent with the hypothesis above that you aren't sending the commands to the flash controller that you think you are sending, and also consistent with the results you are seeing: once you take the flash offline, you are unable to get it to read properly again.

Taking the flash "offline" as I call it involves sending the three address bytes of MMIO to the flash, followed by the wrong mode byte. In DIO XIP mode, this is still a valid read command, but the wrong mode byte tells it to leave DIO XIP mode when complete and go back to SPI mode, so it can accept a command on the next cycle--such as the flash id command. The commands I use for this are carefully chosen so that they don't conflict with a real SPI command--in case the flash is already offline. Hence, taking the flash offline twice won't hurt you. If, however, your config port commands are wrong, such as if they are byte swapped, then sending anything other than the mode byte will also take you out of DIO XIP mode. Again, this is consistent with what you are seeing.

Restoring the flash to DIO XIP requires issuing the command to read in DIO mode, providing 3 bytes of address, and then the proper mode byte. I then like to read at least one byte from the flash for good measure--not knowing if it requires such or not. If you fail to send the proper command byte, or the proper mode byte, this command will not place the flash in DIO XIP mode and all subsequent MMIO reads from it will fail--just like you are seeing. (You might also be causing the controller and flash to drive the data pins at the same time--again, not good.)

It looks like you have some debugging bits available to you. Can you set one of those bits when CYC && CFG_STB && !STALL && i_wb_data == 16'089f and clear it when CFG_STB && !STALL && i_wb_data == anything else? We might then be able to check, after sending such a command as part of the flashid sequence, whether or not the proper command is truly being sent to the flash controller or not.

A second bit could be set to check for any time (CFG_STB || DATA_STB) && STALL are true and yet the data, the write direction, or the address changes. This is a protocol violation, and an easy one to check.

If you still have a bit left over, you might wish to look for any time (CFG_STB || DATA_STB) && STALL is true on any cycle, and then CFG_STB || DATA_STB are low on the next. This could be a valid Wishbone abort request, but probably not what you are intending.

Dan

@ecm-pushbx
Copy link
Author

It looks like you have some debugging bits available to you.

I still have 48 completely unused bits in my 16-byte MMIO segment. And I can enlarge that segment arbitrarily, so lack of bits is not a problem. I will add the ones you suggested later (maybe today).

Can you set one of those bits when CYC && CFG_STB && !STALL && i_wb_data == 16'089f

Do you mean 109Fh? F_MFRID = (CFG_USERMODE|0x09f) and #define CFG_USERMODE (1<<12) so sending F_MFRID to the config port should have I_wb_data equal 109Fh, right?

@ZipCPU
Copy link
Owner

ZipCPU commented Dec 9, 2021

Yes, thank you, 109fh.

@ecm-pushbx
Copy link
Author

Sigh. Without a trace to work from, you're really making this a challenge.

I will look into this when I have more time.

* Most vendor packages will provide an internal logic analyzer tool that you can use to read bits back from within the design and so generate a trace.  With Intel, this is called SignalTap.  (I've never used it ... but it should do the trick.)  The thing to beware of is that SignalTap is a piece of logic you have to place onto your FPGA--so it uses up some of your area, while also collecting trace information from within your design.

[...]

* When a design is running, I use [wbscope](https://github.com/ZipCPU/wbscope) to pull a trace from a design.  The [software that comes with it](https://github.com/ZipCPU/wbscope/blob/master/sw/scopecls.cpp) can [generate a VCD trace for you](https://github.com/ZipCPU/arrowzip/blob/master/sw/host/scopecls.cpp#L319-L478).  There's also a compressed version of the scope in the same repo which I would use for things like this, where you might be running for many cycles without wanting or needing to know what the various captured bits are.  (i.e., when you aren't commanding anything.)

* The trace above, however, was generated using the formal verification tool SymbiYosys.

You may notice that all of my designs have one or more WBSCOPEs in them. It's just a fact of life.

Once you can extract a trace from your design, while running it on the Max1000, showing what's going on right and wrong from within the design--that's when you'll be cooking with gas. You'll find debugging goes a lot faster once you can "see" what's going on within your FPGA.

Okay, so I'd have to look into adding either of these to my design to get a trace.

Now, onto some of the things you've sent. I've examined your assembly file, and I do find a problem--though perhaps not a relevant one. CFG_WEDIR should not be used for the dummy cycles. CFG_WEDIR tells the Dual SPI controller to drive both MOSI and MISO wires. The dummy cycles, however, are used to change pin direction--from the master driving those wires to the slave driving them. Therefore, CFG_WEDIR should be low for these cycles.

Changed that. No difference in behaviour.

Also, let me ask, is there a possibility that you have the byte order wrong when sending the config bytes? That is, are bits [15:8] getting mixed up with bits [7:0] of the i_wb_data word? Or for that matter that you are even setting i_wb_data word correctly at all?

It does seem correct based on my testing so far. I also experimented with swapping the byte order, which resulted in always reading data from the flash correctly (MMIO command 01h), before and after using the config port, but apparently that was because the config port commands never were correctly received in this case.

It looks like you have some debugging bits available to you. Can you set one of those bits when CYC && CFG_STB && !STALL && i_wb_data == 16'089f and clear it when CFG_STB && !STALL && i_wb_data == anything else? We might then be able to check, after sending such a command as part of the flashid sequence, whether or not the proper command is truly being sent to the flash controller or not.

Added to the flashstatus register. As expected the bit is 1 once after sending the CFG_USERMODE | 0x9f, as detected during this call to configwrite before it writes the next command (with input CFG_USERMODE | 0x00). As you can see I made getstatus clear every bit other than bit 0, which indicates that the MMIO module clears bit 0 by itself, without our assistance.

A second bit could be set to check for any time (CFG_STB || DATA_STB) && STALL are true and yet the data, the write direction, or the address changes. This is a protocol violation, and an easy one to check.

Added as well. This bit is never set in my tests.

If you still have a bit left over, you might wish to look for any time (CFG_STB || DATA_STB) && STALL is true on any cycle, and then CFG_STB || DATA_STB are low on the next. This could be a valid Wishbone abort request, but probably not what you are intending.

Also added, also never set in my tests.

Using the current check.asm and mmioflash.vhd I get this result:

Cmd>l 
Start upload now, load is terminated by :00000001FF
>...........................
00000000=16EB5A4D
00000001=000000C3
00000002=06E000F6
00000003=1D9406E0
00000004=00000800
00000005=FFF00140
00000006=FCFA0000
00000007=00E84552
00000008=F9815900
00000009=03750121

00000000=00000000
00000001=00000000
00000002=00000000
00000003=00000000
00000004=00000000
00000005=00000000
00000006=00000000
00000007=00000000
00000008=00000000
00000009=00000000

STATUS SET=0001 FUNCTN=0288 CALLER=0342
(00001117)(00001100)(00001100)(00001100)17000000
(00001100)00 (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF

Program terminated with int19.

As you can see, the "STATUS SET" message is displayed once only, and only bit 0 is set. That means the command 109Fh is correctly sent at that point, and the other two debugging bits are never turned on. I also checked whether perhaps the data returned from the flash controller is not in the low byte of the o_wb_data signal but this appears not to be the case. The 32-bit values in the parentheses are the full o_wb_data / flashdata values read in each configread call.

@ZipCPU
Copy link
Owner

ZipCPU commented Dec 10, 2021

So ... I decided to do a quick review of your design, and your "clockmem" (a.k.a. clk40 looks like a problem). It appears to be a separate clock in your design that you are treating as though it were logic. This is a recipe for timing problems. Worse, its a recipe for timing problems that might not show up in any analysis trace. I've avoided this design approach like the plague for these reasons, although I have seen an official Xilinx design that used it--there are just too many dragons there.

Why not run the entire design off of the clk40 domain? Why do you need the clk_int80 domain? Or, better yet, why not run the entire design off of the clk_int80 domain? The dualflexpress core can easily adjust to other clock speeds if necessary. In general, there are no problems with running the interface at a slower speed.

The problem is that moving across clock domains is ... painful. It can be done, but it takes work to do it right and more work to prove that you've done it right.

Are you sure this is the demon you want to challenge?

If you really want to go here, then consider this: Expanding your status bits to look (in the flash clock domain, clk_int80 currently) for particular sequences--not just single time steps. Look for the full flash ID sequence getting sent. Look also for the full return-to-online sequence being sent. You can often do this with a shift register or a counter: if (we are expecting step two, and see step two) then move to state two else if (we see something different) then move back to state zero. However, because timing is a tricky and nasty thing, even this might not catch any timing issues that might be present.

Dan

@ecm-pushbx
Copy link
Author

So ... I decided to do a quick review of your design, and your "clockmem" (a.k.a. clk40 looks like a problem). It appears to be a separate clock in your design that you are treating as though it were logic. This is a recipe for timing problems. Worse, its a recipe for timing problems that might not show up in any analysis trace. I've avoided this design approach like the plague for these reasons, although I have seen an official Xilinx design that used it--there are just too many dragons there.

Why not run the entire design off of the clk40 domain? Why do you need the clk_int80 domain? Or, better yet, why not run the entire design off of the clk_int80 domain? The dualflexpress core can easily adjust to other clock speeds if necessary. In general, there are no problems with running the interface at a slower speed.

The entire design cannot be run off the faster (currently 100 MHz) clock because, as far as I understand it, the SDRAM is accessed as if it were SRAM. So the CPU has to run much slower (currently 5 MHz, also used by mmioflash as clockmem). Part of that is the comment above the part you quoted. More details are given in the original repo's wiki:

How do you run a CPU that expects 640kB of static RAM, on a (small) FPGA that only offers about 48kB internal blockram and some big SDRAM chip ?

You then decide to (ab)use the dynamic RAM as a static RAM, of course !

The main issue being that the CPU86 core is not designed to interface with a dynamic RAM, because memory accesses must be synchronous to the CPU clock (no wait states or such)

The idea is to use the FPGA vendor SDRAM controller IP (small, fast, debugged), and clock both the SDRAM and controller to a clock way faster than the CPU, such that it appears to be synchronous (like a static RAM)

By trial and errors, I noticed that reasonable SDRAM memory accesses require more than 16 cycles, and less than 20 cycles

Today I also noticed that while the SDRAM backed storage (08000h..FFEF0h) sort of works, it produces a lot of errors when tested (memtest in my c86boot repo). So something is definitely wrong, perhaps 20 cycles are not (always) enough, or this hack just isn't any good.

Neither of these problems should impact mmioflash or the dualflexpress though. And, during debugging I also attempted to run the flash controller on the 5 MHz clock, dropping the clockmem special casing. This did not alter the behaviour of the mmioflash module.

The problem is that moving across clock domains is ... painful. It can be done, but it takes work to do it right and more work to prove that you've done it right.

Are you sure this is the demon you want to challenge?

Not really, but I am fairly certain this isn't the (only) problem. I will try it again with the dual clock stuff removed.

If you really want to go here, then consider this: Expanding your status bits to look (in the flash clock domain, clk_int80 currently) for particular sequences--not just single time steps. Look for the full flash ID sequence getting sent. Look also for the full return-to-online sequence being sent. You can often do this with a shift register or a counter: if (we are expecting step two, and see step two) then move to state two else if (we see something different) then move back to state zero. However, because timing is a tricky and nasty thing, even this might not catch any timing issues that might be present.

Thanks, I may try that.

@ecm-pushbx
Copy link
Author

ecm-pushbx commented Dec 14, 2021

Just tested and running the flash off the slower clock, which drops all the dual clock things in mmioflash, does not alter the behaviour. This is the output from running check.asm -D_EMPTY_BEFORE_READ=1:

Cmd>l 
Start upload now, load is terminated by :00000001FF
>...........................
00000000=16EB5A4D
00000001=000000C3
00000002=06E000F6
00000003=1D9406E0
00000004=00000800
00000005=FFF00140
00000006=FCFA0000
00000007=00E84552
00000008=F9815900
00000009=03750121

00000000=00000000
00000001=00000000
00000002=00000000
00000003=00000000
00000004=00000000
00000005=00000000
00000006=00000000
00000007=00000000
00000008=00000000
00000009=00000000

STATUS SET=0001 FUNCTN=0288 CALLER=0342
(00001117)(00001100)(00001100)(00001100)17000000
(000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF (000011FF)FF

Program terminated with int19.

@ecm-pushbx
Copy link
Author

I also changed the CPU clock to 4 MHz so the SDRAM gets 25 SDRAM clocks per CPU clock. This fixes my memtest. This also means I can eventually replace the internal RAM (currently used for the first 32 KiB of the memory) by the SDRAM entirely, which would allow me to use all the internal RAM for the scope.

@ZipCPU
Copy link
Owner

ZipCPU commented Jan 12, 2022

Any news?

@ecm-pushbx
Copy link
Author

No, sorry, I've been busy. I'll probably try to figure out some other problems running my debugger next. (Trace interrupt sometimes fires too late, arrow keys send the wrong escape codepoints across the serial port.)

@ZipCPU
Copy link
Owner

ZipCPU commented Jan 12, 2022

Fair enough.

Let me also remind you that you can only interact with the device in config mode or in memory mapped mode. If you make a mistake and try to interact with it in memory mapped mode between when you've issued your first config command and the time the script is complete to go back to memory mapped QSPI mode ... bad things happen, and it certainly won't transition properly back to memory mapped mode at the end of the script. I'm not sure if this is your issue or not, but it's worth reminding.

Dan

@ecm-pushbx
Copy link
Author

Yes, I am aware. If I recall: I found that when I've tried using the configuration mode, a hardware reset (using the board's reset button) or reprogramming the hardware will restore the read mode so I can read from the flash again.

@ZipCPU
Copy link
Owner

ZipCPU commented Jan 12, 2022

Given that the reset will cause the flash to re-execute its start up script, this simply proves that ... the flash will respond properly when given the proper requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants