- About this Document
- Prerequisites
- Repository Structure
- Development Strategies
- Framework Features
- Web Client Development
- Kernel Development
1st CLaaS is built to enable initial development on your local machine. Local development, using RTL simulation of the custom kernel is covered in this document. When it comes time to optimize and deploy your kernel, 1st CLaaS provides infrastructure to do so using AWS F1 with Xilinx tools. These are covered in separate documents. Getting Started with F1 will get you started, and the Optimization and Deployment Guide will be your AWS/F1 reference.
Before utilizing this development guide, you should first have an understanding of 1st CLaaS by reading the repository's README.md, and you should have gotten your environment up and running stepping through Getting Started.
This Git repository contains all that is needed to develop a hardware-accelerated web application, including:
- the generic 1st Class Framework
- sample applications that utilize the framework
The framework supports:
- developing a web client on a local machine (Ubuntu 16.04 for us)
- developing the custom hardware kernel locally
- deploying your accelerated application on AWS F1 machines
The top-level file structure is:
framework
: files that may be used by any or all applications- content that provides the generic 1st CLaaS framework
apps
:- app: individual applications utilizing the framework
- content for this specific app
- app: individual applications utilizing the framework
aws
: Scripts, etc. used to control AWS instances.bin
: top-level executables (not all scripts are in a/bin
directory).regress
: a regression script that launches framework and app regression scripts
doc
: documentation (including this document)init
: a script to set up the repository for developmentsdaccel_setup
: a script to set up for development using Xilinx tools on AWS
An application foo
is constructed from code in framework
and apps/foo
. framework
and apps/foo
contain files/directories using nearly identical structure:
bin
: executablesregress
: regression script
webserver
: python web server applicationclient
: browser application code served by web serverhtml
: HTMLjs
: JavaScriptcss
: CSS (Cascading Style Sheets)
host
: OpenCL/C++ host applicationfpga
: content for the FPGA kernel, including Verilog/TL-Verilog source codescripts
: (framework) utility scripts needed to create a Vivado RTL projectsrc
: Verilog/SystemVerilog/TL-Verilog/etc. source code for the kernel
terraform
: content related to platform configuration (creating/starting/stopping/destroying EC2 instances) using Terraformbuild
: content related to building and runningMakefile
: nearly all build automation and interactive commands
out
: the target directory for build results, including the host application binary and FPGA images; (not committed with the repository)prebuilt
: prebuilt host application, and public Amazon FPGA Image (AFI) identifier; these may be committed for convenience so users can run new installations without lengthy builds
Your development strategy--how you partition development tasks--will depend on your specific application, but this section describes some possible strategies and considerations, and development infrastructure provided by 1st CLaaS.
It is perfectly reasonable to develop your own web/cloud application using any technologies and infrastructure you choose and utilize this repository just for the FPGA microservice development (everything server-side). The web infrastructure provided in this repository is intentionally basic to illustrate the framework with relative simplicity. Notably, for package management this repository simply utilizes git submodules rather than using a package manager like npm. Web library content provided in this repository is not currently bundled for package managers, like npm, (but that can easily be remedied).
An actual F1, and even AWS as a whole, are needed very little during development, especially early on. Four modes of execution are supported by the build process, controlled by the TARGET
Make variable, and requiring different platforms:
- Software (
sw
): An optional mode where the RTL kernel is not utilized. Custom C++ code is required to provide emulated kernel behavior. - Simulation (
sim
): Verilator is used for 2-state simulation of the custom RTL kernel. Verilator creates a C++ model of the kernel which is directly compiled in with the host executable. The Host Application C++ code controls the kernel clock and decides when to send/receive data to/from the kernel. - Hardware Emulation (
hw_emu
): This mode is supported by Xilinx SDAccel on AWS. All FPGA logic is simulated, including the custom kernel and surrounding shell logic. This runs much slower than Simulation. - Hardware (
hw
): Uses a real F1 FPGA.
Generally, most kernel development is done in Simulation. Hardware Emulation is used to refine the implementation of the design and may catch a few new bugs because of 4-state modeling and test bench differences. Hardware compilation is primarily for testing the application at speed and for deployment.
It is often reasonable to co-develop the client and kernel as a single application. Other times it can be beneficial to develop the application and kernel separately, in which case test benches with fake FPGA kernel and fake web application will be needed.
Software development might benefit from fake FPGA behavior at various points to speed development and provide control over testing:
- In the Web Client Application, avoiding any server-side functionality altogether. Especially when web development is in a separate repository/framework this can help to cleanly partition development
- In the Web Server and/or Host Application (
sw
target). This enables local development without RTL. (There is currently no specific build target for Web Server without Host, but this could be added.)
FPGA Kernel development might justify test bench development at various points:
- At the custom kernel interface: This limits the design under test to just RTL, and enables any RTL-based testing methodology and framework. Some options would be:
- Vivado on AWS.
- For TL-Verilog kernels, it can be natural to use a TL-Verilog test bench and, for simple test benches, Makerchip.
- At the Xilinx "Custom Logic" boundary: This is also RTL-only. A default test bench leveraging a Xilinx example is available for use (though it is not in use and is probably broken?).
- In the Host Application: A default test bench leveraging a Xilinx example is available for use (though it is not in use and is probably broken?). This would be most useful if OpenCL or RTL shell logic changes are made.
- In the Web Server: If you like Python.
- In a test bench web client: A default web client test bench is provided (as used by the
vadd
example) allowing manual entry of data to send to the kernel - In the real Web Client Application: This would be a options in the Web Client Application to control stimulus for development.
The Makefile
encapsulates just about every command provided by the framework for development. See header comments in <repo>/framework/build/Makefile
for details.
The web server provides, or can provide, the following REST API features.
- Self identification: The web server can determine its own IP to enable direct communication in scenarios where its content is served through a proxy.
- Starting/Stopping an F1 instance for acceleration.
For now, follow the lead of <repo>/framework/client/*/testbench.*
. Methods provided by <repo>/framework/client/js/fpgaServer.js
are very much subject to change.
This section describes development of custom FPGA RTL kernels using the provided Kernel API. Development that modifies the provided kernel interface is not covered here, nor elsewhere (other than in the commented code).
A custom FPGA kernel named foo
is provided as a SystemVerilog module with the following interface. Note that this is subject to change).
module foo_kernel #(
parameter integer C_DATA_WIDTH = 512 // Data width of both input and output data
// (This will be instantiated with its default value of 512.)
) (
input wire clk,
input wire reset,
// Input interface:
output wire in_ready,
input wire in_avail,
input wire [C_DATA_WIDTH-1:0] in_data,
// Output interface:
input wire out_ready,
output wire out_avail,
output wire [C_DATA_WIDTH-1:0] out_data
);
This includes a streaming interface at the input and another at the output. Unless there are modifications to the intervening software layers, the data streaming into the kernel is the data sent from the Web Client application (or cloud application) communicating via the WebSocket, and the data sent from the kernel is received by the Web Client.
The streaming interface includes, ready
, avail
, and data
. This is similar to a standard AXI streaming interface, generally referred to as a ready/valid protocol, though the naming is chosen to be consistent with TL-Verilog naming conventions.
In a given cycle, these signals relate to the same data transfer, where:
ready
: The receiver is willing to accept the data.avail
: The transmitter has data to deliver.data
: The data.
Any means of delivering this kernel can be utilized, but we are partial to developing with TL-Verilog in the makerchip.com IDE. If you are unfamiliar, TL-Verilog and Makerchip are bleeding edge, meaning they are really cool technology, but you will experience some growing pains. We encourage you to use them and support the revolution, and we'll help as best we can.
TL-Verilog is supported in 1st CLaaS, using Redwood EDA's SandPiper(TM) SaaS edition running as a cloud service. The Makefile
uses this free (though as-is) service to compile .tlv
files into .sv
files. A local installation of SandPiper can also be used.
Considerations for using Makerchip:
- Pros:
- No licensing; no installation; all-in-one IDE.
- Easy: Even without a test bench, Makerchip will generate random stimulus.
- Built for TL-Verilog: Integrates the abstractions of TL-Verilog: pipelines, transactions, etc.
- Cons:
- Not seamless: Cut-and-paste files to import/export.
- For small/medium kernels: Limited to single-file editing. For large multi-file kernels, Makerchip, in its current state, would be awkward (though we do it).
- Verilator is used for compilation, so no SystemVerilog/UVM test bench features are available, just TL-Verilog.
Expect the pros to grow and the cons to shrink over time.
To use Makerchip for development, copy .tlv
code into the Makerchip/a> IDE, and copy back when finished (and features are in the plans to automate this). Development templates will be provided over time.
This link opens the latest Mandelbrot Kernel from GitHub in the Makerchip IDE.
A WIP library of generic TL-Verilog components which mostly utilize the ready/avail interface protocol can be found here.