User Guide
Chapter 3 Tutorials
The best way to get started with OpTiMSoC after you’ve prepared your system as described in the previous chapter is to follow some of our tutorials. They are written with two goals in mind: to introduce some of the basic concepts and nomenclature of manycore SoC, and to show you how those are implemented and can be used in OpTiMSoC.
Some of the tutorials (especially the first ones) build on top of each other, so it’s recommended to do them in order. Simply stop if you think you know enough to implement your own ideas!
3.1 Starting Small: Compute Tile and Software (Simulated)
It is a good starting point to simulate a single compute tile of a distributed memory system. Therefore a simple example is included and demonstrates the general simulation approach and gives an insight in the software building process.
Simulating only a single compute tile is essentially an OpenRISC core plus memory and the network adapter, where all I/O of the network adapter is not functional in this test case. It can therefore only be used to simulate local software.
You can find this example in $OPTIMSC/examples/dm/compute_tile.
In default mode they start a server to connect the host software to,
but you can use the parameter --standalone
to run them in
standalone. If you start the simulation now
$OPTIMSOC/examples/dm/compute_tile/tb_compute_tile --standalone |
you will get this output
%Error: ct.vmem:0: $readmem file not found |
Aborting... |
Aborted (core dumped) |
The simulations always expect vmem files that initialize the memories. This needs to be generated from the compiled source code.
Our demonstration software is available in an extra repository:
git clone https://github.com/optimsoc/baremetal-apps |
cd baremetal-apps |
Build a simple “Hello World” example:
cd hello |
make |
You will then find the executable elf file as hello/hello.elf. Furthermore some other files are build:
-
hello.dis
is the disassembly of the file -
hello.bin
is the elf representation of the binary file -
hello.vmem
is a textual copy of the binary file
You can now run the example again, this time with a different memory initialization file:
$OPTIMSOC/examples/dm/compute_tile/tb_compute_tile --standalone --meminit=hello.vmem |
the simulation should terminate with:
[157801] Core 0 has terminated |
[157801] All cores terminated. Exiting.. |
Furthermore, you will find a file called stdout.000
which shows
the actual output:
Hello World! Core 0 of 1 in tile 0, my absolute core id is: 0 |
There are 1 compute tiles: |
rank 0 is tile |
But how does the actual printf-output get there when there is no UART or similar?
OpTiMSoC software makes excessive use of a useful part of the OpenRISC ISA. The
“no operation” instruction l.nop
has a parameter K
in assembly.
This can be used for simulation purposes. It can be used for instrumentation,
tracing or special purposes as writing characters with minimal intrusion or
simulation termination.
The termination is forced with l.nop 0x1
. The instruction is
observed and a trace monitor terminates when it was observed at all
cores (shortly after main
returned).
With this method you can simply provide constants to your simulation
environments. For variables this method is extended by putting data in
further registers (often r3
). This still is minimally intrusive
and allows you to trace values. The printf is also done that way (see
newlib):
void sim_putc(unsigned char c) asm"l.addi\tr3,%0,0": :"r" (c)); |
asm("l.nop %0": :"K" (NOP_PUTC)); |
} |
This function is called from printf as write function. The trace monitor captures theses characters and puts them to the stdout file.
You can easily add your own “traces” using a macro defined in baremetal-libs/src/libbaremetal/include/optimsoc-baremetal.h:
define OPTIMSOC_TRACE(id,v) \ |
asm("l.addi\tr3,%0,0": :"r" (v) : "r3"); \ |
asm("l.nop %0": :"K" (id)); |
3.2 Going Multicore: Simulate Multicore Compute Tiles
Next you might want to build an actual multicore system. In a first step, you can just start simulations of compute tiles with multiple cores:
$OPTIMSOC/examples/dm/compute_tile-dual/tb_compute_tile --standalone --meminit=hello.vmem |
$OPTIMSOC/examples/dm/compute_tile-quad/tb_compute_tile --standalone --meminit=hello.vmem |
$OPTIMSOC/examples/dm/compute_tile-octa/tb_compute_tile --standalone --meminit=hello.vmem |
You can observe, the simulation runs become longer. After each run,
inspect the stdout.*
files.
3.3 Tiled Manycore SoC: Simulate a Small 2x2 Distributed Memory System
Next we want to run an actual NoC-based tiled manycore system-on-chip,
with the examples you get system_2x2_cccc
. The nomenclature in
all pre-packed systems first denotes the dimensions and then the
instantiated tiles, here cccc
as four compute tiles.
Execute it again to get the hello world experience:
$OPTIMSOC/examples/dm/system_2x2_cccc/tb_system_2x2_cccc --standalone --meminit=hello.vmem |
In our simulation all cores in the four tiles run the same software. Before you shout “that’s boring:” You can still write different code depending on which tile and core the software is executed. A couple of functions are useful for that:
-
optimsoc_get_numct()
: The number of compute tiles in the system -
optimsoc_get_numtiles()
: The number of tiles (of any type) in the system -
optimsoc_get_ctrank()
: Get the rank of this compute tile in this system. Essentially this is just a number that uniquely identifies a compute tile.
There are more useful utility functions like those available, find them in the file baremetal-libs/src/libbaremetal/include/optimsoc-baremetal.h.
A simple application that uses those functions to do message passing between
the different tiles is hello\_mpsimple
. This program uses
the simple message passing facilities of the network adapter to send messages.
All cores send a message to core 0. If all messages have been received, core 0
prints a message “Received all messages. Hello World!”.
cd ../hello_mpsimple |
make |
$OPTIMSOC/examples/dm/system_2x2_cccc/tb_system_2x2_cccc --standalone --meminit=hello_mpsimple.vmem |
Have a look what the software does (you find the code in $OPTIMSOC_APPS/baremetal/hello_mpsimple/hello_mpsimple.c). Let’s first check the output of core 0.
$> cat stdout.000 |
Wait for 3 messages |
Received all messages. Hello World! |
Finally, let’s have a quick glance at a more realistic application:
heat_mpsimple
. You can find it in the same place as the previous
applications, hello
and hello_mpsimple
. The application
calculates the heat distribution in a distributed manner. The cores coordinate
their boundary regions by sending messages around.
Can you compile this application and run it? Don’t get nervous, the simulation
can take a couple of minutes to finish. Have a look at the source code and try
to understand what’s going on. Also have a look at the stdout
log files.
Core 0 will also print the complete heat distribution at the end.
3.4 The Look Inside: Introducing the Debug System
Note
This and the following sections have not been tested for this release, and may most probably not run as described. But as a reference, they can serve you to better understand OpTiMSoC. They will be rewritten for the upcoming release. Thanks for your understanding!
In the previous tutorials you have seen some software running on a simple OpTiMSoC system. Until now, you have only seen the output of the applications, not how it works on the inside.
This problem is one of the major problems in embedded systems: you cannot easily look inside (especially as soon as you run on real hardware as opposed to simulation). In more technical terms, the system’s observability is limited. A common way to overcome this is to add a debug and diagnosis infrastructure to the SoC which transfers data from the system to the outside world, usually to a PC of a developer.
OpTiMSoC also comes with an extensive debug system. In this section, we’ll have a look at this system, how it works and how you can use it to debug your applications. But before diving into the details, we’ll have a short discussion of the basics which are necesssary to understand the system.
Many developers know debugging from their daily work. Most of the time it involves running a program inside a debugger like GDB or Microsoft Visual Studio, setting a breakpoint at the right line of code, and stepping through the program from there on, running one instruction (or one line of code) at a time.
This technique is what we call run-control debugging. While it works great for single-threaded programs, it cannot easily be applied to debugging parallel software running on possibly heterogenous many-core SoC. Instead, our solution is solely based on tracing, i.e. collecting information from the system while it is running and then being able to look at this data later to figure out the root cause of a problem.
The debug system consists of two main parts: the hardware part runs on the OpTiMSoC system itself and collects all data. The other part runs on a developer’s PC (often also called host PC) and controls the debugging process and displays the collected data. Both parts are connected using either a USB connection (e.g. when running on the ZTEX boards), or a TCP connection (when running OpTiMSoC in simulations).
3.5 Verilator: Compiled Verilog simulation
At the moment running “verilated” simulations is the best supported way of observing the system traces. We will therefore run the examples from before using a verilated simulation and observing the system in the graphical user interface.
In the following we will have a look at building such a system and how
to observe it with the GUI. In tbench/verilator/dm you find
systems identical to the RTL simulation. We will directly start with
the system_2x2_cccc
. In the base folder you should simply make
it:
$> make |
The command will first generate the verilated version of the 2x2
system. Finally it builds the toplevel files and links to
tb_system_2x2_cccc
and tb_system_2x2_cccc-vcd
. The
latter generates a full VCD trace file of the hardware, which is
much slower and also easily takes up tens of GB.
Similar to the steps described above you will need to build the
software, e.g., the heat example. Again you need to link the
vmem
file. Now start the simulation:
$> ./tb_system_2x2_cccc |
It will start a debug server and wait for connections:
SystemC 2.3.0-ASI --- Feb 11 2013 12:54:17 |
Copyright (c) 1996-2012 by all Contributors, |
ALL RIGHTS RESERVED |
Listening on port 2200 |
In another console now start the OpTiMSoC GUI:
$> optimsocgui |
In the first dialog window you need to set the debug backend to Simulation TCP Interface and proceed then. After the GUI started you need to connect using Target SystemConnect. The system view should change to a 2x2 system.
The last step is to run the system by Target SystemStart CPUs. The execution trace on the bottom of the window will start showing execution sections and events. By moving the mouse over the section you will find the description of the section. Similarly for the events you find a short description of the event.
3.6 Going to the FPGA: ZTEX Boards
The recommended platform for software development or any other system which needs no I/O is the ZTEX boards2. Various variants exist, the supported boards are the 1.15 series version b and d, where the latter is twice as large as the former and can therefore contain more processor cores. The 2x2 example works with both boards.
3.6.1 Prepare: Simulate the Complete System
Before we go to the actual board we want to simulate the entire system on the FPGA to see if the debug system works correctly and the clocks works correct.
The distribution therefore contains a SystemC module that functionally behaves like the USB chip on the ZTEX boards. The host tools can connect to the debug system via this module using a TCP socket.
The system can be found at tbench/rtl/dm/system_2x2_cccc_ztex.
Build the system running make
. Before you simulate the system
you will now need to provide a modelsim.ini
either globally or
in the system’s folder that contains the Xilinx libraries. Once you
have it, you can start the system using
$> make sim-noninteractive |
The simulation will start and you can now connect to the system in a different shell by using the command line interface:
$> optimsoc_cli -bdbgnoc -oconn=tcp,host=localhost,port=2300 |
The command line interface will connect to the system and enumerate all debug modules:
Connected to system. |
System ID: 0x0000ce75 |
Module summary: |
addr. type version name |
0x02 0x02 0x00 ITM |
0x03 0x02 0x00 ITM |
0x04 0x02 0x00 ITM |
0x05 0x02 0x00 ITM |
0x06 0x05 0x00 STM |
0x07 0x05 0x00 STM |
0x08 0x05 0x00 STM |
0x09 0x05 0x00 STM |
0x0a 0x07 0x00 MAM |
0x0b 0x07 0x00 MAM |
0x0c 0x07 0x00 MAM |
0x0d 0x07 0x00 MAM |
The modules are the Instruction Trace Module (ITM), Software Trace Module (STM) and Memory Access Module (MAM) for each of the four cores.
Before debugging now, you will need to build the software as described
before in the sw
subfolder. Once you have build
hello_simplemp
you can execute it in the simulation.
First you enter interactive mode:
$> optimsoc_cli -bdbgnoc -oconn=tcp,host=localhost,port=2300 |
After enumeration you will get an OpTiMSoC>
shell. First you
can initialize the memories:
OpTiMSoC> mem_init hello_simplemp.bin 0-3 |
Next you need to enable logging of the software trace events to a file:
OpTiMSoC> log_stm_trace strace |
Then start the system:
OpTiMSoC> start |
Let it run for a while (1 minute) and then leave the command line interface:
OpTiMSoC> quit |
After that you will find the expected output of the trace events in
strace
.
3.6.2 Synthesis
Once you have checked the correct functionality of the system (or alter your extensions) you can go over to system synthesis for the FPGA. At the moment we support the Synopsys FPGA flow (Synplify).
You can find the system synthesis in the folder syn/dm/system_2x2_cccc_ztex. A Makefile is used to build the systems.
To generate the system first create the project file:
$> make synplify.prj |
Now the Synplify project file has been generated and you’re ready to start the synthesis.
If you want to have the output of the synthesis in a folder different from your
source folder (the one where you just ran make
in), you can set the
environment variable OPTIMSOC_SYN_OUTDIR
to any path you like, e.g. put
export OPTIMSOC_SYN_OUTDIR=$HOME/syn |
in your profile script (e.g. your ~/.bashrc file) and reload it.
Run the synthesis afterwards (for the ZTEX 1.15b or d board):
$> make synplify_115b_ddr |
Once the synthesis is finished you can generate the bitstream:
$> make bitgen_115d_ddr |
3.6.3 Testing on the FPGA
Now that you have generated a bitstream we’re ready to upload it to the FPGA. Connect the ZTEX 1.15 board to your PC via USB.
If you run lsusb
the board identifies itself as:
Bus 001 Device 004: ID 221a:010 |
There is no manufacturer or further information displayed. The reason is, that OpTiMSoC otherwise may require to buy a set of USB identifiers. Instead, all ZTEX boards share the same identifier and the following command is used to find out details on the Firmware, Board and Capabilities:
$> FWLoader -c -ii |
To use the ZTEX boards as a user, it is recommended to add the following udev rule
SUBSYSTEM=="usb", ATTRidVendor}=="221a", ATTRidProduct}=="0100", MODE="0666" |
for example in /etc/udev/rules.d/60-usb.rules
.
If you are running OpTiMSoC on the board for the first time you need to update
the firmware on the board. To do that, switch to the folder
src/sw/firmware/ztex_usbfpga_1_15_fx2_fw in your OpTiMSoC source tree.
Follow the instructions inside the provided README
file to build and
flash the board with the required firmware. All of this only needs to be done
once for each board (until the firmware changes).
Now the board will identify itself using FWLoader -c -ii
:
bus=001 device=4 (`004') ID=221a:100 |
Manufacturer="TUM LIS" Product="OpTiMSoC - ZTEX USB 1.15" SerialNumber="04A32DBCFA" |
productID=10.13.0.0 fwVer=0 ifVer=1 |
FPGA configured |
Capabilities: |
EEPROM read/write |
FPGA configuration |
Flash memory support |
High speed FPGA configuration |
MAC EEPROM read/write |
Everything ready to go? Then upload the bitstream to the FPGA by running
$> make flash_115d_ddr |
in the same folder where you have been running make bitstream_...
etc.
in the previous section. The output should be something like
FWLoader -v 0x221a 0x100 -f -uf /[somepath]/system_2x2_cccc_ztex.bit |
FPGA configuration time: 194 ms |
As the FPGA is now ready you can use the same method to connect to the
FPGA and load software on it as you’ve done in the Section
3.6.1, just this time the
connection paramters used in optimsoc_cli
are a bit different.
Run
$> optimsoc_cli -i -bdbgnoc -oconn=usb |
to connect to the FPGA board over USB. You should again be presented with a
listing of all available debug modules. Now you can continue just as you did
before by calling mem_init
to load some software onto the FPGA, etc.
Congratulations, you’ve run OpTiMSoC on real hardware for the first time! You can now develop software and explore OpTiMSoC. A handy utility is the python interface to the command line interface. Instead of running the interactive mode you can run the script interface like:
$> optimsoc_cli -s <script.py> -bdbgnoc -oconn=usb |
An example python script:
mem_init(2,"hello_simple.bin") |
log_stm_trace("strace") |
start() |
You can also connect to the USB now using the GUI. Now you’re ready to explore and customize OpTiMSoC for yourself. Have fun!