Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 102 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,112 @@
CUDA Rasterizer
===============

[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md)

**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Jie Meng
* [LinkedIn](https://www.linkedin.com/in/jie-meng/), [YouTube](https://www.youtube.com/channel/UC7G8fUcQrrI_1YnXY5sQM6A).
* Tested on: Windows 10, i7-7700HQ @ 2.80GHz, 16GB, GTX 1050 4GB (Personal Laptop)

Demo renders
================

![](renders/11.gif)


Rasterization
==================
_Rasterization_ [[wiki]](https://en.wikipedia.org/wiki/Rasterisation) is a render method in computer graphics, in essence, the process of showing 3D scenes containing objects are:

- Project 3d objects into 2D camera plane
- Given the 2D plane (with fixed resolution) and the projections of 3D shapes, fill the pixels that are inside those projected shapes.

The process can be visualized as the following images:

Project | Fill pixels
------------|---------------
![](renders/ras1.jpg) | ![](renders/ras2.png)

With the power of GPU and CUDA, we could achieve great performance by multi-threading.


Features
==================

### Graphics Features
#### Basic rasterization pipeline: Vertex transformation, Primitive Assembly, Rasterization, Fragment shading

![](renders/truck.gif)

- Multiple Primitive types: Triangles, Wire frame, Points

wire frame | points
------------|---------------
![](renders/cow1.gif) | ![](renders/cow2.gif)

* The way I did multiple primitive types is a simple way: instead of creating new **PrimitiveAssembly** kernels, I still treat primitives as triangles, but only **fill** the pixels on **edges** or **vertices**


#### UV texture mapping with bilinear texture filtering and perspective correct texture coordinates

No bilinear filtering | Bilinear filtering
-------------------------- | ----------------------------
![](renders/no_bilinear_filter.jpg) | ![](renders/bilinear_filter.jpg)


* Bilinear texture mapping [[wiki]](https://en.wikipedia.org/wiki/Bilinear_filtering) is a way to avoid distortion when doing texture mapping. Given a uv coordinate, we use the nearest four texels to interpolate its color value.

#### Automatic Rotation: a dumb way to use shared memory to store a uniform rotation matrix for all vertices, such that the model is rotating by itself.

#### Post-processing: implemented bloom effect

Without bloom | With bloom
------------|---------------
![](renders/duck.gif) | ![](renders/duckbloom.gif)

* bloom effect [[wiki]](https://en.wikipedia.org/wiki/Bloom_(shader_effect)) is a widely-used post processing method to present high-brightness areas. Shared memory is used to store pixel values that a block needed for perform gaussian blur.

### GUI controls
- Number keys to switch primitive types: triangles(1) / wireframe(2) / points(3)
- Key B to turn on/off bloom effect

### Macros
- `#define TEXTURE 1/0` : turn on(1)/off(0) to use diffuse texture
- `#define PERSPECTIVE_INTERP 1/0` : turn on(1)/off(0) to perform perspective correct interpolation
- `#define TEXTURE_BILINEAR 1/0` : turn on(1)/off(0) to use bilinear UV coordinates interpolation
- `#define DOWNSCALERATIO 3/(1-10)` : change down scale ration in gaussian blur, can be any reasonable integer (1-10 recommended)

Analysis
=================

### Basic render pipeline

* Configuration: x64 Release mode, no bilinear texture, no bloom effect

#### Absolute running time

![](renders/perform1.png)

From the chart:
- reastrization kernel and render kernel consumes the most part of running time
- Initialize depth is basically one instruction per thread, so its running time can be viewed as a baseline
- Vertex assembly is looping over vertices, so essentially less threads are launched comparing to depth initialization, resulting less running time.
- Send image to PBO stage is the same for both models, so the time spent is roughly the same
- Since Cow has more primitives than duck model, its primitive assembly, rasterization and render stage all take relatively more time

#### Running time percentages

![](renders/perform2.png)

The above chart shows the percentages of each stage's running time w.r.t. whole pipeline.
- Not supurisingly, rasterization takes the most time
- For small model like duck and cow, rasterization only takes a bit more time that render stage
- For relatively more complex model like truck, rasterization stage can comsume most time of the entire pipeline

#### Bloom effect

In practice, it turns out that bloom effect has almost no effect on performances: no FPS dropping is happening. I believe it because bloom effect (or generally, post-processing) is performed simply on framebuffers, which should not take much time.

### (TODO: Your README)

*DO NOT* leave the README to the last minute! It is a crucial part of the
project, and we will not be able to grade you without a good README.


### Credits
Expand Down
Binary file added renders/11.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/bilinear_filter.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/bloom.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/cow1.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/cow2.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/duck.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/duckbloom.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/no_bilinear_filter.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/perform1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/perform2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ras1.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/ras2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added renders/truck.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 56 additions & 3 deletions src/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,10 @@ int main(int argc, char **argv) {

frame = 0;
seconds = time(NULL);

//to measure time performance
shaderTimer = time(NULL);

fpstracker = 0;

// Launch CUDA/GL
Expand Down Expand Up @@ -99,6 +103,16 @@ void mainLoop() {
float scale = 1.0f;
float x_trans = 0.0f, y_trans = 0.0f, z_trans = -10.0f;
float x_angle = 0.0f, y_angle = 0.0f;

//render mode toggle
int renderMode = RenderMode::Triangle;
// enable blomm effect
bool bloomEffect = true;

//for auto rotation
float autoRotateAngle = 0.0f;
float autoRotateSpeed = 0.3f;

void runCuda() {
// Map OpenGL buffer object for writing from CUDA on a single GPU
// No data is moved (Win & Linux). When mapped to CUDA, OpenGL should not use this buffer
Expand All @@ -119,8 +133,22 @@ void runCuda() {
glm::mat4 MV = V * M;
glm::mat4 MVP = P * MV;

double shaderTimerNew = glfwGetTime();
double deltaTime = shaderTimerNew - shaderTimer;
shaderTimer = shaderTimerNew;

//rotate the object
autoRotateAngle += autoRotateSpeed * deltaTime;
if (autoRotateAngle >= 360.0f)
{
autoRotateAngle = 0.0f;
}

//glm::mat4 autoRotateMat = glm::rotate(autoRotateAngle, glm::vec3(0.0f, 1.0f, 0.0f));
glm::mat4 autoRotateMat = glm::rotate(0.f, glm::vec3(0.0f, 1.0f, 0.0f));

cudaGLMapBufferObject((void **)&dptr, pbo);
rasterize(dptr, MVP, MV, MV_normal);
rasterize(dptr, MVP, MV, MV_normal, renderMode, autoRotateMat, bloomEffect);
cudaGLUnmapBufferObject(pbo);

frame++;
Expand Down Expand Up @@ -189,6 +217,8 @@ bool init(const tinygltf::Scene & scene) {
glUseProgram(passthroughProgram);
glActiveTexture(GL_TEXTURE0);

shaderTimer = glfwGetTime();

return true;
}

Expand Down Expand Up @@ -325,9 +355,28 @@ void errorCallback(int error, const char *description) {
}

void keyCallback(GLFWwindow *window, int key, int scancode, int action, int mods) {
if (key == GLFW_KEY_ESCAPE && action == GLFW_PRESS) {
glfwSetWindowShouldClose(window, GL_TRUE);
if (action == GLFW_PRESS) {
switch (key)
{
case GLFW_KEY_ESCAPE:
glfwSetWindowShouldClose(window, GL_TRUE);
break;
case GLFW_KEY_1:
renderMode = RenderMode::Triangle;
break;
case GLFW_KEY_2:
renderMode = RenderMode::Wireframe;
break;
case GLFW_KEY_3:
renderMode = RenderMode::Points;
break;
case GLFW_KEY_B:
bloomEffect = !bloomEffect;
break;
}

}

}

//----------------------------
Expand Down Expand Up @@ -396,5 +445,9 @@ void mouseMotionCallback(GLFWwindow* window, double xpos, double ypos)
void mouseWheelCallback(GLFWwindow* window, double xoffset, double yoffset)
{
const double s = 1.0; // sensitivity

if (yoffset > 0 && z_trans > -1.5f) {
return;
}
z_trans += (float)(s * yoffset);
}
7 changes: 7 additions & 0 deletions src/main.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,13 @@ using namespace std;
int frame;
int fpstracker;
double seconds;

//TO measure time performance
double shaderTimer;

//render Mode
enum RenderMode{Triangle, Wireframe, Points};

int fps = 0;
GLuint positionLocation = 0;
GLuint texcoordsLocation = 1;
Expand Down
Loading