# Chapter 3 Tutorials

The best way to get started with OpTiMSoC after you’ve prepared your system as described in the previous chapter is to follow some of our tutorials. They are written with two goals in mind: to introduce some of the basic concepts and nomenclature of manycore SoC, and to show you how those are implemented and can be used in OpTiMSoC.

Some of the tutorials (especially the first ones) build on top of each other, so it’s recommended to do them in order. Simply stop if you think you know enough to implement your own ideas!

## 3.1 Starting Small: Compute Tile and Software (Simulated)

It is a good starting point to simulate a single compute tile of a distributed memory system. Therefore a simple example is included and demonstrates the general simulation approach and gives an insight in the software building process.

Simulating only a single compute tile is essentially an OpenRISC core plus memory and the network adapter, where all I/O of the network adapter is not functional in this test case. It can therefore only be used to simulate local software.

You can find this example in $OPTIMSC/examples/dm/compute_tile. In default mode they start a server to connect the host software to, but you can use the parameter --standalone to run them in standalone. If you start the simulation now $OPTIMSOC/examples/dm/compute_tile/tb_compute_tile --standalone

you will get this output

 %Error: ct.vmem:0: $readmem file not found Aborting... Aborted (core dumped) The simulations always expect vmem files that initialize the memories. This needs to be generated from the compiled source code. Our demonstration software is available in an extra repository:  git clone https://github.com/optimsoc/baremetal-apps cd baremetal-apps Build a simple “Hello World” example:  cd hello make You will then find the executable elf file as hello/hello.elf. Furthermore some other files are build: • hello.dis is the disassembly of the file • hello.bin is the elf representation of the binary file • hello.vmem is a textual copy of the binary file You can now run the example again, this time with a different memory initialization file: $OPTIMSOC/examples/dm/compute_tile/tb_compute_tile --standalone --meminit=hello.vmem

the simulation should terminate with:

 [157801] Core 0 has terminated [157801] All cores terminated. Exiting..

Furthermore, you will find a file called stdout.000 which shows the actual output:

 Hello World! Core 0 of 1 in tile 0, my absolute core id is: 0 There are 1 compute tiles: rank 0 is tile

But how does the actual printf-output get there when there is no UART or similar?

OpTiMSoC software makes excessive use of a useful part of the OpenRISC ISA. The “no operation” instruction l.nop has a parameter K in assembly. This can be used for simulation purposes. It can be used for instrumentation, tracing or special purposes as writing characters with minimal intrusion or simulation termination.

The termination is forced with l.nop 0x1. The instruction is observed and a trace monitor terminates when it was observed at all cores (shortly after main returned).

With this method you can simply provide constants to your simulation environments. For variables this method is extended by putting data in further registers (often r3). This still is minimally intrusive and allows you to trace values. The printf is also done that way (see newlib):

 void sim_putc(unsigned char c) asm"l.addi\tr3,%0,0": :"r" (c)); asm("l.nop %0": :"K" (NOP_PUTC)); }

This function is called from printf as write function. The trace monitor captures theses characters and puts them to the stdout file.

You can easily add your own “traces” using a macro defined in baremetal-libs/src/libbaremetal/include/optimsoc-baremetal.h:

 define OPTIMSOC_TRACE(id,v)                \ asm("l.addi\tr3,%0,0": :"r" (v) : "r3"); \ asm("l.nop %0": :"K" (id));

## 3.2 Going Multicore: Simulate Multicore Compute Tiles

Next you might want to build an actual multicore system. In a first step, you can just start simulations of compute tiles with multiple cores:

 $OPTIMSOC/examples/dm/compute_tile-dual/tb_compute_tile --standalone --meminit=hello.vmem$OPTIMSOC/examples/dm/compute_tile-quad/tb_compute_tile --standalone --meminit=hello.vmem $OPTIMSOC/examples/dm/compute_tile-octa/tb_compute_tile --standalone --meminit=hello.vmem You can observe, the simulation runs become longer. After each run, inspect the stdout.* files. ## 3.3 Tiled Manycore SoC: Simulate a Small 2x2 Distributed Memory System Next we want to run an actual NoC-based tiled manycore system-on-chip, with the examples you get system_2x2_cccc. The nomenclature in all pre-packed systems first denotes the dimensions and then the instantiated tiles, here cccc as four compute tiles. Execute it again to get the hello world experience: $OPTIMSOC/examples/dm/system_2x2_cccc/tb_system_2x2_cccc --standalone --meminit=hello.vmem

In our simulation all cores in the four tiles run the same software. Before you shout “that’s boring:” You can still write different code depending on which tile and core the software is executed. A couple of functions are useful for that:

• optimsoc_get_numct(): The number of compute tiles in the system

• optimsoc_get_numtiles(): The number of tiles (of any type) in the system

• optimsoc_get_ctrank(): Get the rank of this compute tile in this system. Essentially this is just a number that uniquely identifies a compute tile.

There are more useful utility functions like those available, find them in the file baremetal-libs/src/libbaremetal/include/optimsoc-baremetal.h.

A simple application that uses those functions to do message passing between the different tiles is hello\_mpsimple. This program uses the simple message passing facilities of the network adapter to send messages. All cores send a message to core 0. If all messages have been received, core 0 prints a message “Received all messages. Hello World!”.

 cd ../hello_mpsimple make $OPTIMSOC/examples/dm/system_2x2_cccc/tb_system_2x2_cccc --standalone --meminit=hello_mpsimple.vmem Have a look what the software does (you find the code in$OPTIMSOC_APPS/baremetal/hello_mpsimple/hello_mpsimple.c). Let’s first check the output of core 0.

 $> cat stdout.000 Wait for 3 messages Received all messages. Hello World! Finally, let’s have a quick glance at a more realistic application: heat_mpsimple. You can find it in the same place as the previous applications, hello and hello_mpsimple. The application calculates the heat distribution in a distributed manner. The cores coordinate their boundary regions by sending messages around. Can you compile this application and run it? Don’t get nervous, the simulation can take a couple of minutes to finish. Have a look at the source code and try to understand what’s going on. Also have a look at the stdout log files. Core 0 will also print the complete heat distribution at the end. ## 3.4 The Look Inside: Introducing the Debug System Note This and the following sections have not been tested for this release, and may most probably not run as described. But as a reference, they can serve you to better understand OpTiMSoC. They will be rewritten for the upcoming release. Thanks for your understanding! In the previous tutorials you have seen some software running on a simple OpTiMSoC system. Until now, you have only seen the output of the applications, not how it works on the inside. This problem is one of the major problems in embedded systems: you cannot easily look inside (especially as soon as you run on real hardware as opposed to simulation). In more technical terms, the system’s observability is limited. A common way to overcome this is to add a debug and diagnosis infrastructure to the SoC which transfers data from the system to the outside world, usually to a PC of a developer. OpTiMSoC also comes with an extensive debug system. In this section, we’ll have a look at this system, how it works and how you can use it to debug your applications. But before diving into the details, we’ll have a short discussion of the basics which are necesssary to understand the system. Many developers know debugging from their daily work. Most of the time it involves running a program inside a debugger like GDB or Microsoft Visual Studio, setting a breakpoint at the right line of code, and stepping through the program from there on, running one instruction (or one line of code) at a time. This technique is what we call run-control debugging. While it works great for single-threaded programs, it cannot easily be applied to debugging parallel software running on possibly heterogenous many-core SoC. Instead, our solution is solely based on tracing, i.e. collecting information from the system while it is running and then being able to look at this data later to figure out the root cause of a problem. The debug system consists of two main parts: the hardware part runs on the OpTiMSoC system itself and collects all data. The other part runs on a developer’s PC (often also called host PC) and controls the debugging process and displays the collected data. Both parts are connected using either a USB connection (e.g. when running on the ZTEX boards), or a TCP connection (when running OpTiMSoC in simulations). ## 3.5 Verilator: Compiled Verilog simulation At the moment running “verilated” simulations is the best supported way of observing the system traces. We will therefore run the examples from before using a verilated simulation and observing the system in the graphical user interface. In the following we will have a look at building such a system and how to observe it with the GUI. In tbench/verilator/dm you find systems identical to the RTL simulation. We will directly start with the system_2x2_cccc. In the base folder you should simply make it: $> make

The command will first generate the verilated version of the 2x2 system. Finally it builds the toplevel files and links to tb_system_2x2_cccc and tb_system_2x2_cccc-vcd. The latter generates a full VCD trace file of the hardware, which is much slower and also easily takes up tens of GB.

Similar to the steps described above you will need to build the software, e.g., the heat example. Again you need to link the vmem file. Now start the simulation:

 $> ./tb_system_2x2_cccc It will start a debug server and wait for connections:  SystemC 2.3.0-ASI --- Feb 11 2013 12:54:17 Copyright (c) 1996-2012 by all Contributors, ALL RIGHTS RESERVED Listening on port 2200 In another console now start the OpTiMSoC GUI: $> optimsocgui

In the first dialog window you need to set the debug backend to Simulation TCP Interface and proceed then. After the GUI started you need to connect using Target System$\rightarrow$Connect. The system view should change to a 2x2 system.

The last step is to run the system by Target System$\rightarrow$Start CPUs. The execution trace on the bottom of the window will start showing execution sections and events. By moving the mouse over the section you will find the description of the section. Similarly for the events you find a short description of the event.

## 3.6 Going to the FPGA: ZTEX Boards

The recommended platform for software development or any other system which needs no I/O is the ZTEX boards2. Various variants exist, the supported boards are the 1.15 series version b and d, where the latter is twice as large as the former and can therefore contain more processor cores. The 2x2 example works with both boards.

### 3.6.1 Prepare: Simulate the Complete System

Before we go to the actual board we want to simulate the entire system on the FPGA to see if the debug system works correctly and the clocks works correct.

The distribution therefore contains a SystemC module that functionally behaves like the USB chip on the ZTEX boards. The host tools can connect to the debug system via this module using a TCP socket.

The system can be found at tbench/rtl/dm/system_2x2_cccc_ztex. Build the system running make. Before you simulate the system you will now need to provide a modelsim.ini either globally or in the system’s folder that contains the Xilinx libraries. Once you have it, you can start the system using

 $> make sim-noninteractive The simulation will start and you can now connect to the system in a different shell by using the command line interface: $> optimsoc_cli -bdbgnoc -oconn=tcp,host=localhost,port=2300

The command line interface will connect to the system and enumerate all debug modules:

 Connected to system. System ID: 0x0000ce75 Module summary: addr.   type    version name 0x02    0x02    0x00    ITM 0x03    0x02    0x00    ITM 0x04    0x02    0x00    ITM 0x05    0x02    0x00    ITM 0x06    0x05    0x00    STM 0x07    0x05    0x00    STM 0x08    0x05    0x00    STM 0x09    0x05    0x00    STM 0x0a    0x07    0x00    MAM 0x0b    0x07    0x00    MAM 0x0c    0x07    0x00    MAM 0x0d    0x07    0x00    MAM

The modules are the Instruction Trace Module (ITM), Software Trace Module (STM) and Memory Access Module (MAM) for each of the four cores.

Before debugging now, you will need to build the software as described before in the sw subfolder. Once you have build hello_simplemp you can execute it in the simulation.

First you enter interactive mode:

 $> optimsoc_cli -bdbgnoc -oconn=tcp,host=localhost,port=2300 After enumeration you will get an OpTiMSoC> shell. First you can initialize the memories:  OpTiMSoC> mem_init hello_simplemp.bin 0-3 Next you need to enable logging of the software trace events to a file:  OpTiMSoC> log_stm_trace strace Then start the system:  OpTiMSoC> start Let it run for a while (1 minute) and then leave the command line interface:  OpTiMSoC> quit After that you will find the expected output of the trace events in strace. ### 3.6.2 Synthesis Once you have checked the correct functionality of the system (or alter your extensions) you can go over to system synthesis for the FPGA. At the moment we support the Synopsys FPGA flow (Synplify). You can find the system synthesis in the folder syn/dm/system_2x2_cccc_ztex. A Makefile is used to build the systems. To generate the system first create the project file: $> make synplify.prj

Now the Synplify project file has been generated and you’re ready to start the synthesis.

If you want to have the output of the synthesis in a folder different from your source folder (the one where you just ran make in), you can set the environment variable OPTIMSOC_SYN_OUTDIR to any path you like, e.g. put

 export OPTIMSOC_SYN_OUTDIR=$HOME/syn in your profile script (e.g. your ~/.bashrc file) and reload it. Run the synthesis afterwards (for the ZTEX 1.15b or d board): $> make synplify_115b_ddr

Once the synthesis is finished you can generate the bitstream:

 $> make bitgen_115d_ddr ### 3.6.3 Testing on the FPGA Now that you have generated a bitstream we’re ready to upload it to the FPGA. Connect the ZTEX 1.15 board to your PC via USB. If you run lsusb the board identifies itself as:  Bus 001 Device 004: ID 221a:010 There is no manufacturer or further information displayed. The reason is, that OpTiMSoC otherwise may require to buy a set of USB identifiers. Instead, all ZTEX boards share the same identifier and the following command is used to find out details on the Firmware, Board and Capabilities: $> FWLoader -c -ii

To use the ZTEX boards as a user, it is recommended to add the following udev rule

 SUBSYSTEM=="usb", ATTRidVendor}=="221a", ATTRidProduct}=="0100", MODE="0666"

for example in /etc/udev/rules.d/60-usb.rules.

If you are running OpTiMSoC on the board for the first time you need to update the firmware on the board. To do that, switch to the folder src/sw/firmware/ztex_usbfpga_1_15_fx2_fw in your OpTiMSoC source tree. Follow the instructions inside the provided README file to build and flash the board with the required firmware. All of this only needs to be done once for each board (until the firmware changes).

Now the board will identify itself using FWLoader -c -ii:

 bus=001  device=4 (004')  ID=221a:100 Manufacturer="TUM LIS"  Product="OpTiMSoC - ZTEX USB 1.15"    SerialNumber="04A32DBCFA" productID=10.13.0.0  fwVer=0  ifVer=1 FPGA configured Capabilities: EEPROM read/write FPGA configuration Flash memory support High speed FPGA configuration MAC EEPROM read/write

Everything ready to go? Then upload the bitstream to the FPGA by running

 $> make flash_115d_ddr in the same folder where you have been running make bitstream_... etc. in the previous section. The output should be something like  FWLoader -v 0x221a 0x100 -f -uf /[somepath]/system_2x2_cccc_ztex.bit FPGA configuration time: 194 ms As the FPGA is now ready you can use the same method to connect to the FPGA and load software on it as you’ve done in the Section 3.6.1, just this time the connection paramters used in optimsoc_cli are a bit different. Run $> optimsoc_cli -i -bdbgnoc -oconn=usb

to connect to the FPGA board over USB. You should again be presented with a listing of all available debug modules. Now you can continue just as you did before by calling mem_init` to load some software onto the FPGA, etc.

Congratulations, you’ve run OpTiMSoC on real hardware for the first time! You can now develop software and explore OpTiMSoC. A handy utility is the python interface to the command line interface. Instead of running the interactive mode you can run the script interface like:

 \$> optimsoc_cli -s  -bdbgnoc -oconn=usb

An example python script:

 mem_init(2,"hello_simple.bin") log_stm_trace("strace") start()

You can also connect to the USB now using the GUI. Now you’re ready to explore and customize OpTiMSoC for yourself. Have fun!