diff --git a/README.md b/README.md index 41b91f0..6485c52 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,112 @@ CUDA Rasterizer =============== -[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md) - **University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** -* (TODO) YOUR NAME HERE - * (TODO) [LinkedIn](), [personal website](), [twitter](), etc. -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +* Jie Meng + * [LinkedIn](https://www.linkedin.com/in/jie-meng/), [YouTube](https://www.youtube.com/channel/UC7G8fUcQrrI_1YnXY5sQM6A). +* Tested on: Windows 10, i7-7700HQ @ 2.80GHz, 16GB, GTX 1050 4GB (Personal Laptop) + +Demo renders +================ + +![](renders/11.gif) + + +Rasterization +================== +_Rasterization_ [[wiki]](https://en.wikipedia.org/wiki/Rasterisation) is a render method in computer graphics, in essence, the process of showing 3D scenes containing objects are: + + - Project 3d objects into 2D camera plane + - Given the 2D plane (with fixed resolution) and the projections of 3D shapes, fill the pixels that are inside those projected shapes. + + The process can be visualized as the following images: + +Project | Fill pixels +------------|--------------- +![](renders/ras1.jpg) | ![](renders/ras2.png) + +With the power of GPU and CUDA, we could achieve great performance by multi-threading. + + +Features +================== + +### Graphics Features +#### Basic rasterization pipeline: Vertex transformation, Primitive Assembly, Rasterization, Fragment shading + +![](renders/truck.gif) + + - Multiple Primitive types: Triangles, Wire frame, Points + + wire frame | points +------------|--------------- +![](renders/cow1.gif) | ![](renders/cow2.gif) + +* The way I did multiple primitive types is a simple way: instead of creating new **PrimitiveAssembly** kernels, I still treat primitives as triangles, but only **fill** the pixels on **edges** or **vertices** + + +#### UV texture mapping with bilinear texture filtering and perspective correct texture coordinates + +No bilinear filtering | Bilinear filtering +-------------------------- | ---------------------------- +![](renders/no_bilinear_filter.jpg) | ![](renders/bilinear_filter.jpg) + + +* Bilinear texture mapping [[wiki]](https://en.wikipedia.org/wiki/Bilinear_filtering) is a way to avoid distortion when doing texture mapping. Given a uv coordinate, we use the nearest four texels to interpolate its color value. + +#### Automatic Rotation: a dumb way to use shared memory to store a uniform rotation matrix for all vertices, such that the model is rotating by itself. + +#### Post-processing: implemented bloom effect + + Without bloom | With bloom +------------|--------------- +![](renders/duck.gif) | ![](renders/duckbloom.gif) + + * bloom effect [[wiki]](https://en.wikipedia.org/wiki/Bloom_(shader_effect)) is a widely-used post processing method to present high-brightness areas. Shared memory is used to store pixel values that a block needed for perform gaussian blur. + +### GUI controls +- Number keys to switch primitive types: triangles(1) / wireframe(2) / points(3) +- Key B to turn on/off bloom effect + +### Macros +- `#define TEXTURE 1/0` : turn on(1)/off(0) to use diffuse texture +- `#define PERSPECTIVE_INTERP 1/0` : turn on(1)/off(0) to perform perspective correct interpolation +- `#define TEXTURE_BILINEAR 1/0` : turn on(1)/off(0) to use bilinear UV coordinates interpolation +- `#define DOWNSCALERATIO 3/(1-10)` : change down scale ration in gaussian blur, can be any reasonable integer (1-10 recommended) + +Analysis +================= + +### Basic render pipeline + +* Configuration: x64 Release mode, no bilinear texture, no bloom effect + +#### Absolute running time + +![](renders/perform1.png) + +From the chart: +- reastrization kernel and render kernel consumes the most part of running time +- Initialize depth is basically one instruction per thread, so its running time can be viewed as a baseline +- Vertex assembly is looping over vertices, so essentially less threads are launched comparing to depth initialization, resulting less running time. +- Send image to PBO stage is the same for both models, so the time spent is roughly the same +- Since Cow has more primitives than duck model, its primitive assembly, rasterization and render stage all take relatively more time + +#### Running time percentages + +![](renders/perform2.png) + +The above chart shows the percentages of each stage's running time w.r.t. whole pipeline. +- Not supurisingly, rasterization takes the most time +- For small model like duck and cow, rasterization only takes a bit more time that render stage +- For relatively more complex model like truck, rasterization stage can comsume most time of the entire pipeline + +#### Bloom effect + +In practice, it turns out that bloom effect has almost no effect on performances: no FPS dropping is happening. I believe it because bloom effect (or generally, post-processing) is performed simply on framebuffers, which should not take much time. -### (TODO: Your README) -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. ### Credits diff --git a/renders/11.gif b/renders/11.gif new file mode 100644 index 0000000..1ba7b57 Binary files /dev/null and b/renders/11.gif differ diff --git a/renders/bilinear_filter.jpg b/renders/bilinear_filter.jpg new file mode 100644 index 0000000..d2dd2e0 Binary files /dev/null and b/renders/bilinear_filter.jpg differ diff --git a/renders/bloom.gif b/renders/bloom.gif new file mode 100644 index 0000000..639cc24 Binary files /dev/null and b/renders/bloom.gif differ diff --git a/renders/cow1.gif b/renders/cow1.gif new file mode 100644 index 0000000..e24ab7b Binary files /dev/null and b/renders/cow1.gif differ diff --git a/renders/cow2.gif b/renders/cow2.gif new file mode 100644 index 0000000..34c4a9c Binary files /dev/null and b/renders/cow2.gif differ diff --git a/renders/duck.gif b/renders/duck.gif new file mode 100644 index 0000000..d6ff20c Binary files /dev/null and b/renders/duck.gif differ diff --git a/renders/duckbloom.gif b/renders/duckbloom.gif new file mode 100644 index 0000000..cdc9680 Binary files /dev/null and b/renders/duckbloom.gif differ diff --git a/renders/no_bilinear_filter.jpg b/renders/no_bilinear_filter.jpg new file mode 100644 index 0000000..10eb6f8 Binary files /dev/null and b/renders/no_bilinear_filter.jpg differ diff --git a/renders/perform1.png b/renders/perform1.png new file mode 100644 index 0000000..9f2e503 Binary files /dev/null and b/renders/perform1.png differ diff --git a/renders/perform2.png b/renders/perform2.png new file mode 100644 index 0000000..8385d1f Binary files /dev/null and b/renders/perform2.png differ diff --git a/renders/ras1.jpg b/renders/ras1.jpg new file mode 100644 index 0000000..17f8046 Binary files /dev/null and b/renders/ras1.jpg differ diff --git a/renders/ras2.png b/renders/ras2.png new file mode 100644 index 0000000..07cfcfd Binary files /dev/null and b/renders/ras2.png differ diff --git a/renders/truck.gif b/renders/truck.gif new file mode 100644 index 0000000..c48d174 Binary files /dev/null and b/renders/truck.gif differ diff --git a/src/main.cpp b/src/main.cpp index 7986959..10771e9 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -52,6 +52,10 @@ int main(int argc, char **argv) { frame = 0; seconds = time(NULL); + + //to measure time performance + shaderTimer = time(NULL); + fpstracker = 0; // Launch CUDA/GL @@ -99,6 +103,16 @@ void mainLoop() { float scale = 1.0f; float x_trans = 0.0f, y_trans = 0.0f, z_trans = -10.0f; float x_angle = 0.0f, y_angle = 0.0f; + +//render mode toggle +int renderMode = RenderMode::Triangle; +// enable blomm effect +bool bloomEffect = true; + +//for auto rotation +float autoRotateAngle = 0.0f; +float autoRotateSpeed = 0.3f; + void runCuda() { // Map OpenGL buffer object for writing from CUDA on a single GPU // No data is moved (Win & Linux). When mapped to CUDA, OpenGL should not use this buffer @@ -119,8 +133,22 @@ void runCuda() { glm::mat4 MV = V * M; glm::mat4 MVP = P * MV; + double shaderTimerNew = glfwGetTime(); + double deltaTime = shaderTimerNew - shaderTimer; + shaderTimer = shaderTimerNew; + + //rotate the object + autoRotateAngle += autoRotateSpeed * deltaTime; + if (autoRotateAngle >= 360.0f) + { + autoRotateAngle = 0.0f; + } + + //glm::mat4 autoRotateMat = glm::rotate(autoRotateAngle, glm::vec3(0.0f, 1.0f, 0.0f)); + glm::mat4 autoRotateMat = glm::rotate(0.f, glm::vec3(0.0f, 1.0f, 0.0f)); + cudaGLMapBufferObject((void **)&dptr, pbo); - rasterize(dptr, MVP, MV, MV_normal); + rasterize(dptr, MVP, MV, MV_normal, renderMode, autoRotateMat, bloomEffect); cudaGLUnmapBufferObject(pbo); frame++; @@ -189,6 +217,8 @@ bool init(const tinygltf::Scene & scene) { glUseProgram(passthroughProgram); glActiveTexture(GL_TEXTURE0); + shaderTimer = glfwGetTime(); + return true; } @@ -325,9 +355,28 @@ void errorCallback(int error, const char *description) { } void keyCallback(GLFWwindow *window, int key, int scancode, int action, int mods) { - if (key == GLFW_KEY_ESCAPE && action == GLFW_PRESS) { - glfwSetWindowShouldClose(window, GL_TRUE); + if (action == GLFW_PRESS) { + switch (key) + { + case GLFW_KEY_ESCAPE: + glfwSetWindowShouldClose(window, GL_TRUE); + break; + case GLFW_KEY_1: + renderMode = RenderMode::Triangle; + break; + case GLFW_KEY_2: + renderMode = RenderMode::Wireframe; + break; + case GLFW_KEY_3: + renderMode = RenderMode::Points; + break; + case GLFW_KEY_B: + bloomEffect = !bloomEffect; + break; + } + } + } //---------------------------- @@ -396,5 +445,9 @@ void mouseMotionCallback(GLFWwindow* window, double xpos, double ypos) void mouseWheelCallback(GLFWwindow* window, double xoffset, double yoffset) { const double s = 1.0; // sensitivity + + if (yoffset > 0 && z_trans > -1.5f) { + return; + } z_trans += (float)(s * yoffset); } diff --git a/src/main.hpp b/src/main.hpp index 4816fa1..b8fcdad 100644 --- a/src/main.hpp +++ b/src/main.hpp @@ -33,6 +33,13 @@ using namespace std; int frame; int fpstracker; double seconds; + +//TO measure time performance +double shaderTimer; + +//render Mode +enum RenderMode{Triangle, Wireframe, Points}; + int fps = 0; GLuint positionLocation = 0; GLuint texcoordsLocation = 1; diff --git a/src/rasterize.cu b/src/rasterize.cu index 1262a09..8581506 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -18,6 +18,12 @@ #include #include +#define TEXTURE 1 +#define PERSPECTIVE_INTERP 1 +#define TEXTURE_BILINEAR 0 + +#define DOWNSCALERATIO 3 + namespace { typedef unsigned short VertexIndex; @@ -34,6 +40,9 @@ namespace { Triangle = 3 }; + //render Mode + enum RenderMode { Triangles, Wireframe, Points }; + struct VertexOut { glm::vec4 pos; @@ -43,11 +52,14 @@ namespace { glm::vec3 eyePos; // eye space position used for shading glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation - // glm::vec3 col; + + glm::vec3 col; + glm::vec2 texcoord0; TextureData* dev_diffuseTex = NULL; // int texWidth, texHeight; // ... + int diffuseTexWidth, diffuseTexHeight; }; struct Primitive { @@ -62,10 +74,14 @@ namespace { // The attributes listed below might be useful, // but always feel free to modify on your own - // glm::vec3 eyePos; // eye space position used for shading - // glm::vec3 eyeNor; + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; // VertexAttributeTexcoord texcoord0; - // TextureData* dev_diffuseTex; + TextureData* dev_diffuseTex; + + int uvStart; + glm::vec2 uvBilinear; + int texWidth, texHeight; // ... }; @@ -111,20 +127,47 @@ static glm::vec3 *dev_framebuffer = NULL; static int * dev_depth = NULL; // you might need this buffer when doing depth test +//used for atomic depth writing +static int * dev_mutex = NULL; +//used for bloom effect post processing +static glm::vec3 * dev_framebufferAux = NULL; +static glm::vec3 * dev_framebufferDownScaled = NULL; +static glm::vec3 * dev_framebufferDownScaledAux = NULL; + /** * Kernel that writes the image to the OpenGL PBO directly. */ __global__ -void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { +void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image, + //used for bloomEffect + int framebufferEdgeOffset, + int downScaleW, + int downScaleRatio +) +{ int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; int index = x + (y * w); if (x < w && y < h) { + //frame buffer id + int fid; + if (downScaleRatio == 1) + { + fid = x + (y * w); + } + else + { + fid = (x / downScaleRatio) + framebufferEdgeOffset + + ((y / downScaleRatio) + framebufferEdgeOffset) * (downScaleW + 2 * framebufferEdgeOffset); + } glm::vec3 color; - color.x = glm::clamp(image[index].x, 0.0f, 1.0f) * 255.0; - color.y = glm::clamp(image[index].y, 0.0f, 1.0f) * 255.0; - color.z = glm::clamp(image[index].z, 0.0f, 1.0f) * 255.0; + //color.x = glm::clamp(image[index].x, 0.0f, 1.0f) * 255.0; + //color.y = glm::clamp(image[index].y, 0.0f, 1.0f) * 255.0; + //color.z = glm::clamp(image[index].z, 0.0f, 1.0f) * 255.0; + color.x = glm::clamp(image[fid].x, 0.0f, 1.0f) * 255.0; + color.y = glm::clamp(image[fid].y, 0.0f, 1.0f) * 255.0; + color.z = glm::clamp(image[fid].z, 0.0f, 1.0f) * 255.0; // Each thread writes one pixel location in the texture (textel) pbo[index].w = 0; pbo[index].x = color.x; @@ -133,23 +176,289 @@ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { } } + +__device__ glm::vec3 sampleFromTexture(TextureData* dev_diffuseTex, int uvStart) { + return glm::vec3(dev_diffuseTex[uvStart + 0], + dev_diffuseTex[uvStart + 1], + dev_diffuseTex[uvStart + 2]) / 255.f; +} + +__device__ glm::vec3 sampleFromTextureBilinear(TextureData* dev_diffuseTex, glm::vec2 uv, int width, int height) { + // top right corner of grid cell + int u = uv.x; + int v = uv.y; + + // bottom right corner of grid cell + int u1 = glm::clamp(u + 1, 0, width - 1); + int v1 = glm::clamp(v + 1, 0, height - 1); + + // sample all 4 colors using indices to sample from texture + int id00 = (u + v * width) * 3; + int id01 = (u + v1 * width) * 3; + int id10 = (u1 + v * width) * 3; + int id11 = (u1 + v1 * width) * 3; + glm::vec3 c00 = sampleFromTexture(dev_diffuseTex, id00); + glm::vec3 c01 = sampleFromTexture(dev_diffuseTex, id01); + glm::vec3 c10 = sampleFromTexture(dev_diffuseTex, id10); + glm::vec3 c11 = sampleFromTexture(dev_diffuseTex, id11); + + // lerp horizontally using x fraction + glm::vec3 lerp1 = glm::mix(c00, c01, (float)uv.x - (float)u); + glm::vec3 lerp2 = glm::mix(c10, c11, (float)uv.x - (float)u); + + // return lerped color vertically using y fraction + return glm::mix(lerp1, lerp2, (float)uv.y - (float)v); +} + /** * Writes fragment colors to the framebuffer */ __global__ -void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) { +void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer, + glm::vec3 lightPos, //for lighting calc + int renderMode, //switch rendermode + int framebufferEdgeOffset //bloom Effect + ) +{ int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; - int index = x + (y * w); - if (x < w && y < h) { - framebuffer[index] = fragmentBuffer[index].color; + if (x < w && y < h) { + int index = x + (y * w); - // TODO: add your fragment shader code here + //framebuffer[index] = fragmentBuffer[index].color; - } + ////// TODO: add your fragment shader code here + + Fragment f = fragmentBuffer[index]; + if (renderMode == RenderMode::Triangles) + { + +#if TEXTURE + if (f.dev_diffuseTex) { +#if TEXTURE_BILINEAR + f.color = sampleFromTextureBilinear(f.dev_diffuseTex, f.uvBilinear, f.texWidth, f.texHeight); +#else + f.color = sampleFromTexture(f.dev_diffuseTex, f.uvStart); +#endif + } + else { + f.color = glm::vec3(0.f); + } +#endif + + glm::vec3 lightDir = glm::normalize(glm::vec3(0.5f, 0.2f, 0.7f) - f.eyePos); + //glm::vec3 lightDir = glm::normalize(lightPos - f.eyePos); + float lambert = glm::dot(lightDir, f.eyeNor); + float ambient = 0.3f; + float light_exp = 1.0f; + lambert = lambert * light_exp + ambient; + framebuffer[index] = f.color * lambert; + + } + //for wireframe and points render mode, pass the color + else if (renderMode == RenderMode::Wireframe || renderMode == RenderMode::Points) + { + framebuffer[index] = f.color; + } + } } +// bloom effect : horizontal Gaussian blur +__global__ +void gaussianBlurHorizon(int w, int h, glm::vec3 * framebufferIn, + glm::vec3 * framebufferOut, int framebufferEdgeOffset) +{ + //using shared memory to store shared frame buffer + __shared__ glm::vec3 framebufferIn_shared[144]; //144 = 8 * (5 + 8 + 5) + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + if (x < w && y < h) + { + //framebuffer index + int fid = (x + framebufferEdgeOffset) + ((y + framebufferEdgeOffset) * (w + 2 * framebufferEdgeOffset)); + framebufferOut[fid] = glm::vec3(0.0f); + //read framebufferIn into shared memory + //18 = 5 * 2 + 8 + int index = threadIdx.y * 18 + threadIdx.x + 5; + if (threadIdx.x == 0) + { + framebufferIn_shared[index - 5] = framebufferIn[fid - 5]; + framebufferIn_shared[index - 4] = framebufferIn[fid - 4]; + framebufferIn_shared[index - 3] = framebufferIn[fid - 3]; + framebufferIn_shared[index - 2] = framebufferIn[fid - 2]; + framebufferIn_shared[index - 1] = framebufferIn[fid - 1]; + } + if (threadIdx.x == blockDim.x - 1) + { + framebufferIn_shared[index + 5] = framebufferIn[fid + 5]; + framebufferIn_shared[index + 4] = framebufferIn[fid + 4]; + framebufferIn_shared[index + 3] = framebufferIn[fid + 3]; + framebufferIn_shared[index + 2] = framebufferIn[fid + 2]; + framebufferIn_shared[index + 1] = framebufferIn[fid + 1]; + } + framebufferIn_shared[index] = framebufferIn[fid]; + + __syncthreads(); + + //apply horizontal gaussian blur and write to output + framebufferOut[fid] += framebufferIn_shared[index - 5] * 0.0093f; + framebufferOut[fid] += framebufferIn_shared[index - 5] * 0.0093f; + framebufferOut[fid] += framebufferIn_shared[index - 4] * 0.028002f; + framebufferOut[fid] += framebufferIn_shared[index - 3] * 0.065984f; + framebufferOut[fid] += framebufferIn_shared[index - 2] * 0.121703f; + framebufferOut[fid] += framebufferIn_shared[index - 1] * 0.175713f; + framebufferOut[fid] += framebufferIn_shared[index] * 0.198596f; + framebufferOut[fid] += framebufferIn_shared[index + 1] * 0.175713f; + framebufferOut[fid] += framebufferIn_shared[index + 2] * 0.121703f; + framebufferOut[fid] += framebufferIn_shared[index + 3] * 0.065984f; + framebufferOut[fid] += framebufferIn_shared[index + 4] * 0.028002f; + framebufferOut[fid] += framebufferIn_shared[index + 5] * 0.0093f; + } +} + +//bloom effect : vertical gussian blur +__global__ +void gaussianBlurVert(int w, int h, glm::vec3 * framebufferIn, + glm::vec3 * framebufferOut, int framebufferEdgeOffset) +{ + //Same as horizontal + + //array size should be blocksize.x * (framebufferEdgeOffset + blocksize.y + framebufferEdgeOffset) + // 8 * (5 + 8 + 5) -> 144 + __shared__ glm::vec3 framebuffer_in_shared[144]; + + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + if (x < w && y < h) { + //framebuffer index + int fid = (x + framebufferEdgeOffset) + ((y + framebufferEdgeOffset) * (w + 2 * framebufferEdgeOffset)); + framebufferOut[fid] = glm::vec3(0.f); + + int numOfelementsOneRow = w + 2 * framebufferEdgeOffset; + + // blocksize.x + // 8 + int index = (threadIdx.y + 5) * 8 + threadIdx.x; + //write into shared memory + if (threadIdx.y == 0) { + // 40, 32, 24... -> 5 * blocksize.x + framebuffer_in_shared[index - 40] = framebufferIn[fid - 5 * numOfelementsOneRow]; + framebuffer_in_shared[index - 32] = framebufferIn[fid - 4 * numOfelementsOneRow]; + framebuffer_in_shared[index - 24] = framebufferIn[fid - 3 * numOfelementsOneRow]; + framebuffer_in_shared[index - 16] = framebufferIn[fid - 2 * numOfelementsOneRow]; + framebuffer_in_shared[index - 8] = framebufferIn[fid - 1 * numOfelementsOneRow]; + } + if (threadIdx.y == blockDim.y - 1) { + framebuffer_in_shared[index + 8] = framebufferIn[fid + 1 * numOfelementsOneRow]; + framebuffer_in_shared[index + 16] = framebufferIn[fid + 2 * numOfelementsOneRow]; + framebuffer_in_shared[index + 24] = framebufferIn[fid + 3 * numOfelementsOneRow]; + framebuffer_in_shared[index + 32] = framebufferIn[fid + 4 * numOfelementsOneRow]; + framebuffer_in_shared[index + 40] = framebufferIn[fid + 5 * numOfelementsOneRow]; + } + framebuffer_in_shared[index] = framebufferIn[fid]; + + __syncthreads(); + + //apply vertical gaussian blur, write into output frames + framebufferOut[fid] += framebuffer_in_shared[index - 40] * 0.0093f; + framebufferOut[fid] += framebuffer_in_shared[index - 32] * 0.028002f; + framebufferOut[fid] += framebuffer_in_shared[index - 24] * 0.065984f; + framebufferOut[fid] += framebuffer_in_shared[index - 16] * 0.121703f; + framebufferOut[fid] += framebuffer_in_shared[index - 8] * 0.175713f; + framebufferOut[fid] += framebuffer_in_shared[index] * 0.198596f; + framebufferOut[fid] += framebuffer_in_shared[index + 8] * 0.175713f; + framebufferOut[fid] += framebuffer_in_shared[index + 16] * 0.121703f; + framebufferOut[fid] += framebuffer_in_shared[index + 24] * 0.065984f; + framebufferOut[fid] += framebuffer_in_shared[index + 32] * 0.028002f; + framebufferOut[fid] += framebuffer_in_shared[index + 40] * 0.0093f; + } +} + +//downSample kernel +__global__ +void downScaleSample(int downScaleW, int downScaleH, int downScaleRatio, + int w, int h, + glm::vec3 * downScale_framebuffer, glm::vec3* framebuffer, + int framebufferEdgeOffset) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < downScaleW && y < downScaleH) + { + int index = (x + framebufferEdgeOffset) + ((y + framebufferEdgeOffset) * (downScaleW + 2 * framebufferEdgeOffset)); + glm::vec3& f_col = downScale_framebuffer[index]; + f_col = glm::vec3(0.0f); + + float sampleCount = (float)downScaleRatio * (float)downScaleRatio; + int oriFramebufferX, oriFramebufferY; + int oriFramebufferInd; + //down sample and calculate average + for (int i = 0; i < downScaleRatio; i++) + { + for (int j = 0; j < downScaleRatio; j++) + { + oriFramebufferX = x * downScaleRatio + i; + oriFramebufferY = y * downScaleRatio + j; + oriFramebufferX = glm::clamp(oriFramebufferX, 0, w - 1); + oriFramebufferY = glm::clamp(oriFramebufferY, 0, h - 1); + oriFramebufferInd = oriFramebufferX + (oriFramebufferY * w); + f_col += framebuffer[oriFramebufferInd]; + } + } + f_col /= sampleCount; + } +} + +//bloom effect: brightness filter +__global__ +void brightnessFilter(int w, int h, glm::vec3 * framebufferIn, glm::vec3 * framebufferOut) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + if (x < w && y < h) + { + int index = x + y * w; + // input frame buffer at index, not reference + glm::vec3 f_in = framebufferIn[index]; + //calculate brightness of this frame buffer + float brightness = f_in.r * 0.2126f + f_in.g * 0.7152f + f_in.b * 0.0722f; + //framebufferOut[index] = brightness * f_in; + //brightness /= 2.0f; + //framebufferOut[index] = glm::vec3(brightness); + framebufferOut[index] = brightness * f_in; + } +} + +//bloom effect: final stage, combine buffers +__global__ +void bloomCombineFramebuffers(int w, int h, + glm::vec3 * mainFramebuffer, glm::vec3 * sideFramebuffer, + glm::vec3 * framebufferOut, + int sideFramebuffer_downScaleW, + int sideFramebuffer_downScaleRatio, int framebufferEdgeOffset) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + if (x < w && y < h) + { + int mainIdx = x + (y * w); + glm::vec3 main_col = mainFramebuffer[mainIdx]; + int sideIdx = ((x / sideFramebuffer_downScaleRatio) + framebufferEdgeOffset) + + (((y / sideFramebuffer_downScaleRatio) + framebufferEdgeOffset) * + (sideFramebuffer_downScaleW + 2 * framebufferEdgeOffset)); + glm::vec3 side_col = sideFramebuffer[sideIdx]; + + //combination factor + float k = 1.2f; + //comine color and write to output + framebufferOut[mainIdx] = main_col + k * side_col; + } +} + +//edge width in gaussian blur +int bloomGBedge = 5; /** * Called once at the beginning of the program to allocate memory. */ @@ -162,10 +471,32 @@ void rasterizeInit(int w, int h) { cudaFree(dev_framebuffer); cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3)); - + //initialize auxilliary frame buffers for bloom effect + cudaFree(dev_framebufferAux); + cudaMalloc(&dev_framebufferAux, width * height * sizeof(glm::vec3)); + cudaMemset(dev_framebufferAux, 0, width * height * sizeof(glm::vec3)); + + int downScaleRatio = DOWNSCALERATIO; + cudaFree(dev_framebufferDownScaled); + cudaMalloc(&dev_framebufferDownScaled, + ((width / downScaleRatio) + 2 * bloomGBedge) * ((height / downScaleRatio) + 2 * bloomGBedge) * sizeof(glm::vec3)); + cudaMemset(dev_framebufferDownScaled, 0, + ((width / downScaleRatio) + 2 * bloomGBedge) * ((height / downScaleRatio) + 2 * bloomGBedge) * sizeof(glm::vec3)); + + cudaFree(dev_framebufferDownScaledAux); + cudaMalloc(&dev_framebufferDownScaledAux, + ((width / downScaleRatio) + 2 * bloomGBedge) * ((height / downScaleRatio) + 2 * bloomGBedge) * sizeof(glm::vec3)); + cudaMemset(dev_framebufferDownScaledAux, 0, + ((width / downScaleRatio) + 2 * bloomGBedge) * ((height / downScaleRatio) + 2 * bloomGBedge) * sizeof(glm::vec3)); + + + cudaFree(dev_depth); cudaMalloc(&dev_depth, width * height * sizeof(int)); + cudaFree(dev_mutex); + cudaMalloc(&dev_mutex, width * height * sizeof(Fragment)); + cudaMemset(dev_mutex, 0, width * height * sizeof(Fragment)); checkCUDAError("rasterizeInit"); } @@ -628,7 +959,10 @@ void _vertexTransformAndAssembly( int numVertices, PrimitiveDevBufPointers primitive, glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, - int width, int height) { + int width, int height, + glm::mat4 autoRotateMat4 //to enable auto rotation of the object + ) +{ // vertex id int vid = (blockIdx.x * blockDim.x) + threadIdx.x; @@ -639,9 +973,75 @@ void _vertexTransformAndAssembly( // Then divide the pos by its w element to transform into NDC space // Finally transform x and y to viewport space + //use shared memory to apply auto rotation + __shared__ glm::mat4 _autoRotateMat4; + if (threadIdx.x == 0) + { + _autoRotateMat4 = autoRotateMat4; + } + __syncthreads(); + // TODO: Apply vertex assembly here // Assemble all attribute arraies into the primitive array + VertexOut& thisDevVertexOut = primitive.dev_verticesOut[vid]; + ////multiply model-view-projective matrix + //glm::vec4 worldSpacePos = MVP * _autoRotateMat4* glm::vec4(primitive.dev_position[vid], 1.0f); + ////Projective divide + //glm::vec4 NDCpos = worldSpacePos * (1.0f / worldSpacePos.w); + ////transform into pixels + //glm::vec4 PixelPos = glm::vec4( + // (NDCpos.x + 1.0f) * (float)width / 2.0f, + // (1.0f - NDCpos.y) * (float)height / 2.0f, + // NDCpos.z, + // NDCpos.w); + ////write into vertexout struct + //thisDevVertexOut.pos = PixelPos; + ////Eye space pos + //glm::vec3 eyeSpacePos = glm::vec3(MV * _autoRotateMat4 *glm::vec4(primitive.dev_position[vid], 1.0f)); + //thisDevVertexOut.eyePos = eyeSpacePos; + + + int index = vid; + glm::vec4 vPos = glm::vec4(primitive.dev_position[index], 1.0f); + glm::vec4 vEyePos = MV * vPos; + thisDevVertexOut.eyePos = glm::vec3(vEyePos); + + vPos = MVP * vPos; + vPos /= vPos.w; + vPos.x = 0.5f * (float)width * (vPos.x + 1.0f); + vPos.y = 0.5f * (float)height * (1.0f - vPos.y); + vPos.z = -vPos.z; + + thisDevVertexOut.pos = vPos; + + + //Eye space normal + glm::vec3 eyeSpaceNormal = glm::normalize(MV_normal * glm::mat3(_autoRotateMat4) *primitive.dev_normal[vid]); + thisDevVertexOut.eyeNor = eyeSpaceNormal; + //texture coords + if (primitive.dev_texcoord0 != NULL) + { + thisDevVertexOut.texcoord0 = primitive.dev_texcoord0[vid]; + } + else + { + thisDevVertexOut.texcoord0 = glm::vec2(0.0f, 0.0f); + } + //diffuse texture + if (primitive.dev_diffuseTex != NULL) + { + thisDevVertexOut.dev_diffuseTex = primitive.dev_diffuseTex; + thisDevVertexOut.diffuseTexHeight = primitive.diffuseTexHeight; + thisDevVertexOut.diffuseTexWidth = primitive.diffuseTexWidth; + } + //else assign sequential RGB color + else + { + thisDevVertexOut.col = glm::vec3(0.f); + thisDevVertexOut.col[vid % 3] += 0.6f; + } + } } @@ -660,12 +1060,12 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_ // TODO: uncomment the following code for a start // This is primitive assembly for triangles - //int pid; // id for cur primitives vector - //if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { - // pid = iid / (int)primitive.primitiveType; - // dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] - // = primitive.dev_verticesOut[primitive.dev_indices[iid]]; - //} + int pid; // id for cur primitives vector + if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { + pid = iid / (int)primitive.primitiveType; + dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] + = primitive.dev_verticesOut[primitive.dev_indices[iid]]; + } // TODO: other primitive types (point, line) @@ -673,12 +1073,381 @@ void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_ } +////helper function: fill specific fragment buffer +//__device__ +//void fillFragmentBufferRoutine(Fragment& thisFragment, +// glm::vec3 p, glm::vec3 p1, glm::vec3 p2, glm::vec3 p3, +// VertexOut v1, VertexOut v2, VertexOut v3) +//{ +// int diffuseTexWidth = v1.diffuseTexWidth; +// int diffuseTexHeight = v1.diffuseTexHeight; +// TextureData* textureData = v1.dev_diffuseTex; +// +// //positions +// glm::vec3 eyeSpacePos_interpolated = vec3AttriInterpolate( +// p, p1, p2, p3, +// v1.eyePos, v2.eyePos, v3.eyePos +// ); +// thisFragment.eyePos = eyeSpacePos_interpolated; +// +// //normals +// glm::vec3 eyeSpaceNor_interpolated = vec3AttriInterpolate( +// p, p1, p2, p3, +// v1.eyeNor, v2.eyeNor, v3.eyeNor +// ); +// thisFragment.eyeNor = glm::normalize(eyeSpaceNor_interpolated); +// +// //uv coordinates +// glm::vec2 uv_interpolated = vec2AttriInterpolate( +// p, p1, p2, p3, +// v1.texcoord0, v2.texcoord0, v3.texcoord0 +// ); +// //get texture and color the fragment +// if (textureData != NULL) +// { +// //read texture and color the fragment with it +// glm::ivec2 texSpaceCoord = glm::ivec2(diffuseTexWidth * uv_interpolated.x, +// diffuseTexHeight * uv_interpolated.y); +// int texIdx = texSpaceCoord.x + diffuseTexWidth * texSpaceCoord.y; +// +// //read from color channels +// int textChannelsNum = 3; +// TextureData r = textureData[texIdx * textChannelsNum]; +// TextureData g = textureData[texIdx * textChannelsNum + 1]; +// TextureData b = textureData[texIdx * textChannelsNum + 2]; +// +// thisFragment.color = glm::vec3((float)r / 255.0f, +// (float)g / 255.0f, +// (float)b / 255.0f); +// //thisFragment.color = thisFragment.eyeNor; +// } +// else +// { +// thisFragment.color = glm::vec3(1.0f, 1.0f, 1.0f); +// } +// +//} + + + +/** + * fill in wireframe mode + */ +__device__ +void rasterizerFillWireFrame( + Fragment* fragmentBuffer, int* depth, + glm::vec3 p1, glm::vec3 p2, glm::vec3 p3, + int w, int h) +{ + + //2D Bounding Box + float xMin = fminf(p1.x, fminf(p2.x, p3.x)); + float xMax = fmaxf(p1.x, fmaxf(p2.x, p3.x)); + float yMin = fminf(p1.y, fminf(p2.y, p3.y)); + float yMax = fmaxf(p1.y, fmaxf(p2.y, p3.y)); + + //check bounding box with screen size + //get start and end indices in pixel from bounding box + float xStart = xMin < 0 ? 0 : (int)glm::floor(xMin); + float xEnd = xMax > w ? w : (int)glm::ceil(xMax); + + float yStart = yMin < 0 ? 0 : (int)glm::floor(yMin); + float yEnd = yMax > h ? h : (int)glm::ceil(yMax); + + glm::vec3 tris[3]; + tris[0] = p1; + tris[1] = p2; + tris[2] = p3; + + int fragmentIdx; + + float lineThickness = 0.08f; + glm::vec3 wireColor = glm::vec3(0.85f, 0.85f, 0.35f); + + + //scan pixel lines to fill the current primitive(triangle) + for (int i = yStart; i <= yEnd; i++) + { + for (int j = xStart; j <= xEnd; j++) + { + glm::vec3 barycentricCoord = calculateBarycentricCoordinate(tris, glm::vec2(j, i)); + if (glm::abs(barycentricCoord.x) < lineThickness) + { + if (barycentricCoord.y >= 0.0f && barycentricCoord.y <= 1.0f + &&barycentricCoord.z >= 0.0f && barycentricCoord.z <= 1.0f) + { + fragmentIdx = j + (i * w); + fragmentBuffer[fragmentIdx].color = wireColor; + } + } + if (glm::abs(barycentricCoord.y) < lineThickness) + { + if (barycentricCoord.x >= 0.0f && barycentricCoord.x <= 1.0f + &&barycentricCoord.z >= 0.0f && barycentricCoord.z <= 1.0f) + { + fragmentIdx = j + (i * w); + fragmentBuffer[fragmentIdx].color = wireColor; + } + } + if (glm::abs(barycentricCoord.z) < lineThickness) + { + if (barycentricCoord.y >= 0.0f && barycentricCoord.y <= 1.0f + &&barycentricCoord.x >= 0.0f && barycentricCoord.x <= 1.0f) + { + fragmentIdx = j + (i * w); + fragmentBuffer[fragmentIdx].color = wireColor; + } + } + + } + + } +} + +/** + * fill in point mode + */ +__device__ +void rasterizerFillPoints( + Fragment* fragmentBuffer, int* depth, + glm::vec3 p1, glm::vec3 p2, glm::vec3 p3, + int w, int h) +{ + + //2D Bounding Box + float xMin = fminf(p1.x, fminf(p2.x, p3.x)); + float xMax = fmaxf(p1.x, fmaxf(p2.x, p3.x)); + float yMin = fminf(p1.y, fminf(p2.y, p3.y)); + float yMax = fmaxf(p1.y, fmaxf(p2.y, p3.y)); + + //check bounding box with screen size + //get start and end indices in pixel from bounding box + float xStart = xMin < 0 ? 0 : (int)glm::floor(xMin); + float xEnd = xMax > w ? w : (int)glm::ceil(xMax); + + float yStart = yMin < 0 ? 0 : (int)glm::floor(yMin); + float yEnd = yMax > h ? h : (int)glm::ceil(yMax); + + glm::vec3 tris[3]; + tris[0] = p1; + tris[1] = p2; + tris[2] = p3; + + int fragmentIdx; + + float pointThickness = 0.08f; + glm::vec3 pointColor = glm::vec3(0.35f, 0.85f, 0.85f); + + + //scan pixel lines to fill the current primitive(triangle) + for (int i = yStart; i <= yEnd; i++) + { + for (int j = xStart; j <= xEnd; j++) + { + glm::vec3 barycentricCoord = calculateBarycentricCoordinate(tris, glm::vec2(j, i)); + + if (glm::abs(barycentricCoord.x - 1.0f) < pointThickness) + { + if (glm::abs(barycentricCoord.y) < pointThickness + &&glm::abs(barycentricCoord.z) < pointThickness) + { + fragmentIdx = j + (i * w); + fragmentBuffer[fragmentIdx].color = pointColor; + } + } + if (glm::abs(barycentricCoord.y - 1.0f) < pointThickness) + { + if (glm::abs(barycentricCoord.x) < pointThickness + &&glm::abs(barycentricCoord.z) < pointThickness) + { + fragmentIdx = j + (i * w); + fragmentBuffer[fragmentIdx].color = pointColor; + } + } + if (glm::abs(barycentricCoord.z - 1.0f) < pointThickness) + { + if (glm::abs(barycentricCoord.y) < pointThickness + &&glm::abs(barycentricCoord.x) < pointThickness) + { + fragmentIdx = j + (i * w); + fragmentBuffer[fragmentIdx].color = pointColor; + } + } + + } + + } +} + +/** +* fill fragment method, actually control over different renderModes +* call this function in rasterize function +*/ +__global__ +void rasterizeFill(int numPrimitives, int curPrimitiveBeginId, Primitive* primitives, + Fragment* fragmentBuffer, int * depth, int w, int h, int renderMode, int* dev_mutex) +{ + int primitiveIdx = (blockIdx.x * blockDim.x) + threadIdx.x; + if (primitiveIdx < numPrimitives) + { + Primitive& thisPrimitive = primitives[primitiveIdx + curPrimitiveBeginId]; + glm::vec3 p1(thisPrimitive.v[0].pos[0], thisPrimitive.v[0].pos[1], thisPrimitive.v[0].pos[2]); + glm::vec3 p2(thisPrimitive.v[1].pos[0], thisPrimitive.v[1].pos[1], thisPrimitive.v[1].pos[2]); + glm::vec3 p3(thisPrimitive.v[2].pos[0], thisPrimitive.v[2].pos[1], thisPrimitive.v[2].pos[2]); + + //Switch between different rendermodes + if (renderMode == RenderMode::Triangles) + { + //rasterizerFillTriangle(thisPrimitive, + // fragmentBuffer, depth, p1, p2, p3, w, h,dev_mutex); + + ////2D Bounding Box + //float xMin = fminf(p1.x, fminf(p2.x, p3.x)); + //float xMax = fmaxf(p1.x, fmaxf(p2.x, p3.x)); + //float yMin = fminf(p1.y, fminf(p2.y, p3.y)); + //float yMax = fmaxf(p1.y, fmaxf(p2.y, p3.y)); + + ////check bounding box with screen size + ////get start and end indices in pixel from bounding box + //float xStart = xMin < 0 ? 0 : (int)glm::floor(xMin); + //float xEnd = xMax > w ? w : (int)glm::ceil(xMax); + + //float yStart = yMin < 0 ? 0 : (int)glm::floor(yMin); + //float yEnd = yMax > h ? h : (int)glm::ceil(yMax); + + glm::vec3 tri[3] = { p1,p2,p3 }; + AABB boundingBox = getAABBForTriangleWithClamp(tri, (float)w, (float)h); + float xStart = boundingBox.min.x; + float xEnd = boundingBox.max.x; + + float yStart = boundingBox.min.y; + float yEnd = boundingBox.max.y; + //scan pixel lines to fill the current primitive(triangle) + for (int i = yStart; i <= yEnd; i++) + { + for (int j = xStart; j <= xEnd; j++) + { + // test if the pos is inside the current triangle + glm::vec3 baryCoords = calculateBarycentricCoordinate(tri, glm::vec2(j, i)); + if (isBarycentricCoordInBounds(baryCoords)) + { + + + /*if (isPosInTriangle(glm::vec3(j, i, 0.f), p1, p2, p3)) + {*/ + //interpolate depth + //depth value is in eye pace + //const float z_interpolated = depthInterpolate(glm::vec3(j, i, 0.f), p1, p2, p3); + const float z_interpolated = getZAtCoordinate(baryCoords, tri); + //CAUTIOUS(basic) + //We need to use atomic function to write depth value + //But it only supports integers, so we multiply it by a large number to get as many digits as we can + //Note: int32 is bwtween -2^31 and 2^31, which is 2147483648 + const int z_rounded = z_interpolated * INT_MAX; + + //atomic depth writing + int fragmentIdx = j + (i * w); + + bool isSet; + do { + isSet = (atomicCAS(&dev_mutex[fragmentIdx], 0, 1) == 0); + if (isSet) { + if (z_rounded < depth[fragmentIdx]) + { + depth[fragmentIdx] = z_rounded; + //The pos of current interested fragment + glm::vec3 p((float)j, (float)i, z_interpolated); + + //fillFragmentBufferRoutine(fragmentBuffer[fragmentIdx], + // p, p1, p2, p3, + // thisPrimitive.v[0], thisPrimitive.v[1], thisPrimitive.v[2]); + Fragment f; + + f.eyeNor = glm::normalize( + thisPrimitive.v[0].eyeNor * baryCoords.x + + thisPrimitive.v[1].eyeNor * baryCoords.y + + thisPrimitive.v[2].eyeNor * baryCoords.z); + + f.eyePos = glm::normalize( + thisPrimitive.v[0].eyePos * baryCoords.x + + thisPrimitive.v[1].eyePos * baryCoords.y + + thisPrimitive.v[2].eyePos * baryCoords.z); +#if PERSPECTIVE_INTERP + // get correct depth to use for interpolation + float newBaryDepth = 1.f / (1 / thisPrimitive.v[0].pos.z * baryCoords.x + + 1 / thisPrimitive.v[1].pos.z * baryCoords.y + + 1 / thisPrimitive.v[2].pos.z * baryCoords.z); + + f.color = glm::normalize(newBaryDepth * (thisPrimitive.v[0].col * baryCoords.x / thisPrimitive.v[0].pos.z + + thisPrimitive.v[1].col * baryCoords.y / thisPrimitive.v[1].pos.z + + thisPrimitive.v[2].col * baryCoords.z / thisPrimitive.v[2].pos.z)); +#else + f.color = glm::normalize(thisPrimitive.v[0].col * baryCoords.x + + thisPrimitive.v[1].col * baryCoords.y + + thisPrimitive.v[2].col * baryCoords.z); +#endif + + + + /***** Texture handling ****/ +#if TEXTURE +#if PERSPECTIVE_INTERP + // get the starting index of the color in the tex buffer + glm::vec2 uv = newBaryDepth * ( + thisPrimitive.v[0].texcoord0 * baryCoords.x / thisPrimitive.v[0].pos.z + + thisPrimitive.v[1].texcoord0 * baryCoords.y / thisPrimitive.v[1].pos.z + + thisPrimitive.v[2].texcoord0 * baryCoords.z / thisPrimitive.v[2].pos.z); + +#else + glm::vec2 uv = thisPrimitive.v[0].texcoord0 * baryCoords.x + + thisPrimitive.v[1].texcoord0 * baryCoords.y + + thisPrimitive.v[2].texcoord0 * baryCoords.z; +#endif + uv.x *= thisPrimitive.v[0].diffuseTexWidth; + uv.y *= thisPrimitive.v[0].diffuseTexHeight; +#if TEXTURE_BILINEAR + f.uvBilinear = uv; + f.diffuseTexWidth = thisPrimitive.v[0].diffuseTexWidth; + f.diffuseTexHeight = thisPrimitive.v[0].diffuseTexHeight; +#endif + // actual index into tex buffer is 1D, multip by 3 since every 3 floats (rgb) is a color + f.uvStart = ((int)uv.x + (int)uv.y * thisPrimitive.v[0].diffuseTexWidth) * 3; + f.dev_diffuseTex = thisPrimitive.v[0].dev_diffuseTex; +#endif + fragmentBuffer[fragmentIdx] = f; + } + } + if (isSet) + { + dev_mutex[fragmentIdx] = 0; + } + } while (!isSet); + + } + } + + } + + } + if (renderMode == RenderMode::Wireframe) + { + rasterizerFillWireFrame(fragmentBuffer, + depth, p1, p2, p3, w, h); + } + if (renderMode == RenderMode::Points) + { + rasterizerFillPoints(fragmentBuffer, depth, p1, p2, p3, w, h); + } + } +} /** * Perform rasterization. */ -void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal) { +void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal, + int renderMode, glm::mat4 autoRotateMat4, + bool bloomEffect) { int sideLength2d = 8; dim3 blockSize2d(sideLength2d, sideLength2d); dim3 blockCount2d((width - 1) / blockSize2d.x + 1, @@ -702,7 +1471,7 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height); + _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height, autoRotateMat4); checkCUDAError("Vertex Processing"); cudaDeviceSynchronize(); _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > @@ -723,15 +1492,77 @@ void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const g initDepth << > >(width, height, dev_depth); // TODO: rasterize + + curPrimitiveBeginId = 0; + dim3 numThreadsPerBlock(128); + auto it = mesh2PrimitivesMap.begin(); + auto itEnd = mesh2PrimitivesMap.end(); + for (; it != itEnd; ++it) + { + auto p = (it->second).begin(); + auto pEnd = (it->second).end(); + for (; p != pEnd; ++p) + { + //launch kernel to fill fragments + dim3 numBlocksForPrimitives((p->numPrimitives + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + rasterizeFill<<>>(p->numPrimitives, curPrimitiveBeginId,dev_primitives, dev_fragmentBuffer, + dev_depth, width, height,renderMode, dev_mutex); + checkCUDAError("rasterizerFillPrimitive error!"); + cudaDeviceSynchronize(); + curPrimitiveBeginId += p->numPrimitives; + } + } // Copy depthbuffer colors into framebuffer - render << > >(width, height, dev_fragmentBuffer, dev_framebuffer); + // Modified: need more info in fragment shader such as light position + glm::vec3 singlePointLight(3.0f, 6.0f, -5.0f); + render << > >(width, height, dev_fragmentBuffer, dev_framebuffer, singlePointLight, renderMode, bloomGBedge); checkCUDAError("fragment shader"); - // Copy framebuffer into OpenGL buffer for OpenGL previewing - sendImageToPBO<<>>(pbo, width, height, dev_framebuffer); - checkCUDAError("copy render result to pbo"); + + if (bloomEffect) + { + //BLOOM EFFECT POST PROCESSING + + //BRIGHTNESS FITLER + brightnessFilter<<>>(width, height, dev_framebuffer, dev_framebufferAux); + //DOWN SCALE + int downScaleRatio = DOWNSCALERATIO; + dim3 blockCount2dDownScaled((width / downScaleRatio - 1) / blockSize2d.x + 1, + (height / downScaleRatio - 1) / blockSize2d.y + 1); + downScaleSample<<>>(width / downScaleRatio, height / downScaleRatio, downScaleRatio, + width, height, + dev_framebufferDownScaled, dev_framebufferAux, + bloomGBedge); + //GAUSSIAN BLUR + gaussianBlurHorizon << > > (width / downScaleRatio, height / downScaleRatio, + dev_framebufferDownScaled, dev_framebufferDownScaledAux, + bloomGBedge); + gaussianBlurVert << > > (width / downScaleRatio, height / downScaleRatio, + dev_framebufferDownScaledAux, dev_framebufferDownScaled, + bloomGBedge); + //FINAL COMBINE + bloomCombineFramebuffers << > > (width, height, + dev_framebuffer, dev_framebufferDownScaled, + dev_framebufferAux, + width / downScaleRatio, + downScaleRatio, + bloomGBedge); + checkCUDAError("bloom effect"); + + // Copy post-processed framebuffer into OpenGL buffer for OpenGL previewing + sendImageToPBO << > > (pbo, width, height, dev_framebufferAux, bloomGBedge, width,1); + checkCUDAError("copy render result to pbo"); + } + + else + { + // Copy framebuffer into OpenGL buffer for OpenGL previewing + sendImageToPBO << > > (pbo, width, height, dev_framebuffer,bloomGBedge, width, 1); + checkCUDAError("copy render result to pbo"); + } + } /** @@ -769,8 +1600,20 @@ void rasterizeFree() { cudaFree(dev_framebuffer); dev_framebuffer = NULL; + //free extra buffers + cudaFree(dev_framebufferAux); + dev_framebufferAux = NULL; + + cudaFree(dev_framebufferDownScaled); + dev_framebufferDownScaled = NULL; + + cudaFree(dev_framebufferDownScaledAux); + dev_framebufferDownScaled = NULL; + cudaFree(dev_depth); dev_depth = NULL; + + checkCUDAError("rasterize Free"); } diff --git a/src/rasterize.h b/src/rasterize.h index 560aae9..f9ba086 100644 --- a/src/rasterize.h +++ b/src/rasterize.h @@ -11,6 +11,7 @@ #include #include #include +#include "device_launch_parameters.h" namespace tinygltf{ class Scene; @@ -20,5 +21,6 @@ namespace tinygltf{ void rasterizeInit(int width, int height); void rasterizeSetBuffers(const tinygltf::Scene & scene); -void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal); +void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal, + int renderMode, glm::mat4 autoRotateMat4, bool bloomEffect); void rasterizeFree(); diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h index 46c701e..17eecee 100644 --- a/src/rasterizeTools.h +++ b/src/rasterizeTools.h @@ -30,15 +30,15 @@ glm::vec3 multiplyMV(glm::mat4 m, glm::vec4 v) { * Finds the axis aligned bounding box for a given triangle. */ __host__ __device__ static -AABB getAABBForTriangle(const glm::vec3 tri[3]) { +AABB getAABBForTriangleWithClamp(const glm::vec3 tri[3], float width, float height) { AABB aabb; aabb.min = glm::vec3( - min(min(tri[0].x, tri[1].x), tri[2].x), - min(min(tri[0].y, tri[1].y), tri[2].y), + max(min(min(tri[0].x, tri[1].x), tri[2].x),0.0f), + max(min(min(tri[0].y, tri[1].y), tri[2].y),0.0f), min(min(tri[0].z, tri[1].z), tri[2].z)); aabb.max = glm::vec3( - max(max(tri[0].x, tri[1].x), tri[2].x), - max(max(tri[0].y, tri[1].y), tri[2].y), + min(max(max(tri[0].x, tri[1].x), tri[2].x),width), + min(max(max(tri[0].y, tri[1].y), tri[2].y),height), max(max(tri[0].z, tri[1].z), tri[2].z)); return aabb; } @@ -48,7 +48,7 @@ AABB getAABBForTriangle(const glm::vec3 tri[3]) { * Calculate the signed area of a given triangle. */ __host__ __device__ static -float calculateSignedArea(const glm::vec3 tri[3]) { +float calculateSignedArea(glm::vec3 tri[3]) { return 0.5 * ((tri[2].x - tri[0].x) * (tri[1].y - tri[0].y) - (tri[1].x - tri[0].x) * (tri[2].y - tri[0].y)); } @@ -57,7 +57,7 @@ float calculateSignedArea(const glm::vec3 tri[3]) { * Helper function for calculating barycentric coordinates. */ __host__ __device__ static -float calculateBarycentricCoordinateValue(glm::vec2 a, glm::vec2 b, glm::vec2 c, const glm::vec3 tri[3]) { +float calculateBarycentricCoordinateValue(glm::vec2 a, glm::vec2 b, glm::vec2 c, glm::vec3 tri[3]) { glm::vec3 baryTri[3]; baryTri[0] = glm::vec3(a, 0); baryTri[1] = glm::vec3(b, 0); @@ -70,7 +70,7 @@ float calculateBarycentricCoordinateValue(glm::vec2 a, glm::vec2 b, glm::vec2 c, * Calculate barycentric coordinates. */ __host__ __device__ static -glm::vec3 calculateBarycentricCoordinate(const glm::vec3 tri[3], glm::vec2 point) { +glm::vec3 calculateBarycentricCoordinate(glm::vec3 tri[3], glm::vec2 point) { float beta = calculateBarycentricCoordinateValue(glm::vec2(tri[0].x, tri[0].y), point, glm::vec2(tri[2].x, tri[2].y), tri); float gamma = calculateBarycentricCoordinateValue(glm::vec2(tri[0].x, tri[0].y), glm::vec2(tri[1].x, tri[1].y), point, tri); float alpha = 1.0 - beta - gamma; @@ -99,3 +99,112 @@ float getZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3]) + barycentricCoord.y * tri[1].z + barycentricCoord.z * tri[2].z); } + +//HELPER FUNCTIONS + +/** + * Helper function, determine whether a point is inside a triangle + */ +__device__ +bool isPosInTriangle(glm::vec3 p, glm::vec3 pos1, glm::vec3 pos2, glm::vec3 pos3) +{ + glm::vec3 v(p[0], p[1], 0.f); + glm::vec3 v1(pos1[0], pos1[1], 0.f); + glm::vec3 v2(pos2[0], pos2[1], 0.f); + glm::vec3 v3(pos3[0], pos3[1], 0.f); + + //compute areas for barycentric coordinates + float area = 0.5f * glm::length(glm::cross(v1 - v2, v3 - v2)); + float area1 = 0.5f * glm::length(glm::cross(v - v2, v3 - v2)); + float area2 = 0.5f * glm::length(glm::cross(v - v3, v1 - v3)); + float area3 = 0.5f * glm::length(glm::cross(v - v1, v2 - v1)); + + //check the sum of three (signed) areas against the whole area + return glm::abs(area1 + area2 + area3 - area) < 0.001f; +} + +/** + * Helper function, interpolate depth value + * Argument list: p is in screen space: (pixel.x, pixel.y, 0) + * pos1, pos2, pos3 are in combination of screen space and camera space: (pixel.x, pixel.y, eyeSpace.z) + */ + +__device__ +float depthInterpolate(glm::vec3 p, + glm::vec3 pos1, glm::vec3 pos2, glm::vec3 pos3) +{ + glm::vec3 v1(pos1[0], pos1[1], 0.f); + glm::vec3 v2(pos2[0], pos2[1], 0.f); + glm::vec3 v3(pos3[0], pos3[1], 0.f); + + //compute areas for barycentric coordinates + float area = 0.5f * glm::length(glm::cross(v1 - v2, v3 - v2)); + float area1 = 0.5f * glm::length(glm::cross(p - v2, v3 - v2)); + float area2 = 0.5f * glm::length(glm::cross(p - v3, v1 - v3)); + float area3 = 0.5f * glm::length(glm::cross(p - v1, v2 - v1)); + + //calculate interpolated z: + float z_inverse_interpolated = area1 / (pos1[2] * area) + area2 / (pos2[2] * area) + area3 / (pos3[2] * area); + return 1.0f / z_inverse_interpolated; +} + +/** + * Helper function, interpolate any vec2 vertex attributes, mainly for texture coords + * Argument list: p, pos1, pos2, pos3 are in combination of screen space and camera space: (pixel.x, pixel.y, eyeSpace.z) + * Attributes to interpolate are attri1, attri2 and attri3 + */ +__device__ +glm::vec2 vec2AttriInterpolate(glm::vec3 p, + glm::vec3 pos1, glm::vec3 pos2, glm::vec3 pos3, + glm::vec2 attri1, glm::vec2 attri2, glm::vec2 attri3) +{ + glm::vec3 v(p[0], p[1], 0.f); + glm::vec3 v1(pos1[0], pos1[1], 0.f); + glm::vec3 v2(pos2[0], pos2[1], 0.f); + glm::vec3 v3(pos3[0], pos3[1], 0.f); + + //compute areas for barycentric coordinates + float area = 0.5f * glm::length(glm::cross(v1 - v2, v3 - v2)); + float area1 = 0.5f * glm::length(glm::cross(v - v2, v3 - v2)); + float area2 = 0.5f * glm::length(glm::cross(v - v3, v1 - v3)); + float area3 = 0.5f * glm::length(glm::cross(v - v1, v2 - v1)); + + //formula: + // A/Z = Sigma (Ai/Zi * Si/S) + glm::vec2 AttriIntepolate = + (attri1 / pos1[2]) * (area1 / area) + + (attri2 / pos2[2]) * (area2 / area) + + (attri3 / pos3[2]) * (area3 / area); + return AttriIntepolate * p[2]; +} + +/** + * Helper function, interpolate any vec3 vertex attributes + * Argument list: p, pos1, pos2, pos3 are in combination of screen space and camera space: (pixel.x, pixel.y, eyeSpace.z) + * Attributes to interpolate are attri1, attri2 and attri3 + * basicall same as vec2 attribute interpolation, but with vec3 + */ +__device__ +glm::vec3 vec3AttriInterpolate(glm::vec3 p, + glm::vec3 pos1, glm::vec3 pos2, glm::vec3 pos3, + glm::vec3 attri1, glm::vec3 attri2, glm::vec3 attri3) +{ + glm::vec3 v(p[0], p[1], 0.f); + glm::vec3 v1(pos1[0], pos1[1], 0.f); + glm::vec3 v2(pos2[0], pos2[1], 0.f); + glm::vec3 v3(pos3[0], pos3[1], 0.f); + + //compute areas for barycentric coordinates + float area = 0.5f * glm::length(glm::cross(v1 - v2, v3 - v2)); + float area1 = 0.5f * glm::length(glm::cross(v - v2, v3 - v2)); + float area2 = 0.5f * glm::length(glm::cross(v - v3, v1 - v3)); + float area3 = 0.5f * glm::length(glm::cross(v - v1, v2 - v1)); + + //formula: + // A/Z = Sigma (Ai/Zi * Si/S) + glm::vec3 AttriIntepolate = + (attri1 / pos1[2]) * (area1 / area) + + (attri2 / pos2[2]) * (area2 / area) + + (attri3 / pos3[2]) * (area3 / area); + return AttriIntepolate * p[2]; +}