diff --git a/README.md b/README.md index 41b91f0..f9e1bd1 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,95 @@ CUDA Rasterizer =============== -[CLICK ME FOR INSTRUCTION OF THIS PROJECT](./INSTRUCTION.md) +**University of Pennsylvania, CIS 565: GPU Programming and Architecture** -**University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 4** +**Anantha Srinivas** +[LinkedIn](https://www.linkedin.com/in/anantha-srinivas-00198958/), [Twitter](https://twitter.com/an2tha) -* (TODO) YOUR NAME HERE - * (TODO) [LinkedIn](), [personal website](), [twitter](), etc. -* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab) +**Tested on:** +* Windows 10, i7-8700 @ 3.20GHz 16GB, GTX 1080 8097MB (Personal) +* Built for Visual Studio 2017 using the v140 toolkit -### (TODO: Your README) +# Introduction -*DO NOT* leave the README to the last minute! It is a crucial part of the -project, and we will not be able to grade you without a good README. +![](renders/pipeline.png) + +This implementation of Rasterization is done on CUDA and split into these five important steps: + +* Vertex Transform and Assembly - This acts as a vertex shader and transforms the input vertices from world space to NDC space and then into screen space. Screen space transformation is done because we are deadling with pixels on the framebuffer. + +* Primitive Assembly - From the list of vertices, we construct primitives containing 1/2/3 vertices depending upon the geometry(point, line or triangle). To select the vertices, index buffer is used. + +* Core Rasterization - This performs a scan of pixel in the small rectangular boundary around each primitive. Then we check if the pixel lies inside or outside this primitive and set initialize the fragment accordingly. We also perform a depth check to ensure that the fragment is valid. + +* Pixel Shading - This acts as the pixel shader, where we calculate the color from the fragment. + +* Updating frame buffer - The frame buffer for each frame is updated to the screen. Normally in modern graphics API, there might be two or more framebuffer, so that while one is updating other one is shown. Also, if any super sampling is used here, the framebuffer is downscaled accordingly. + + +# Features + +The current version of the rasterizer supprots the following features: + +* Vertex Shading and primitive assembly with depth testing + +| Color | Normal | Depth | +| ------------- | ------------- | ------------- | +|![](renders/color.PNG) | ![](renders/normals.PNG) | ![](renders/depth.PNG) + +Simple, basic implementations of vertex transform, normal calculation and transform and depth checking. + +* Instancing + +![](renders/instancing.PNG) + +This is implemented similar to modern graphics APIs. Each vertex shader of each instance has the same pointer to the input vertex buffer. The output vertices are modified here based on the instance ID by offsetting in screen space. + +* Texture Mapping (2D) with bilinear filtering + +![](renders/render_cesium.PNG) + +Similar to vertex and normals, texture coordinates is calculated per pixel. It is then corrected to account for depth by performing perspective correction. + +In bilinear filtering, we sample not only the current texture coordinate but also the neighbours. This is then accompanied by mixing all the sampled points in a ratio determined by how far off is the texture coordinate from the sampled point. + +* SuperSample Antialiasing + +| No SSAA | SSAA 2 | +| ------------- | ------------- | +|![](renders/ssaa_1.PNG) | ![](renders/ssaa_2.PNG) | + +SuperSample AntiAliasing is the most basic implementation of AA. Here, the framebuffer size is incrased by the level of AA. We perform all the calculation with this enlarged framebuffer and then downscale it when updating to screen. It can be expensive both computationally and memory. + +* Color interpolation between points on a primitive + +![](renders/render_di.PNG) + +Interpolating color between vertices of a primitive using the bary centric coordinates. + + +* Points and Lines + +| Triangles | Lines | Points | +| ------------- | ------------- | ------------- | +| ![](renders/triangles.PNG) | ![](renders/lines.PNG) |![](renders/points.PNG) + +# Performance Analysis + +![](renders/performance_analysis.png) + +**Testing Conditions** +* Launched in Release Mode +* NVIDIA V-Sync turned off +* Run with NSight Profiler + +**General Analysis** +* From the performance graph is is fairly evident that the most expensive component of the rasterizer (in terms of time) is the raterizer. This is because of the fact that for each primitive, we need to scan pixel by pixel (computationally O(n^2)). The obvious way to minimize this would be to have some hardware implementations for some of the sub-functions. Other than that, we could load up the framebuffer and depth buffer into shared memory. Although not implemented, this could potentially speed up the kernel in magnitudes of 10X. +* We also see that the Render step, where we perform pixel shading is fairly same across all models. This is because the kernel is launched per pixel (number which remains same across all models). The calculation inside this pixel is fairly short. + +# Build Command + +`cmake .. -G "Visual Studio 15 2017 Win64" -DCUDA_TOOLKIT_ROOT_DIR="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v9.2" -T v140,cuda=9.2` ### Credits diff --git a/Report/New Microsoft PowerPoint Presentation.pptx b/Report/New Microsoft PowerPoint Presentation.pptx new file mode 100644 index 0000000..052a8cf Binary files /dev/null and b/Report/New Microsoft PowerPoint Presentation.pptx differ diff --git a/gltfs/BoxTextured.gltf b/gltfs/BoxTextured.gltf new file mode 100644 index 0000000..4a6ab9c --- /dev/null +++ b/gltfs/BoxTextured.gltf @@ -0,0 +1,181 @@ +{ + "asset": { + "generator": "COLLADA2GLTF", + "version": "2.0" + }, + "scene": 0, + "scenes": [ + { + "nodes": [ + 0 + ] + } + ], + "nodes": [ + { + "children": [ + 1 + ], + "matrix": [ + 1.0, + 0.0, + 0.0, + 0.0, + 0.0, + 0.0, + -1.0, + 0.0, + 0.0, + 1.0, + 0.0, + 0.0, + 0.0, + 0.0, + 0.0, + 1.0 + ] + }, + { + "mesh": 0 + } + ], + "meshes": [ + { + "primitives": [ + { + "attributes": { + "NORMAL": 1, + "POSITION": 2, + "TEXCOORD_0": 3 + }, + "indices": 0, + "mode": 4, + "material": 0 + } + ], + "name": "Mesh" + } + ], + "accessors": [ + { + "bufferView": 0, + "byteOffset": 0, + "componentType": 5123, + "count": 36, + "max": [ + 23 + ], + "min": [ + 0 + ], + "type": "SCALAR" + }, + { + "bufferView": 1, + "byteOffset": 0, + "componentType": 5126, + "count": 24, + "max": [ + 1.0, + 1.0, + 1.0 + ], + "min": [ + -1.0, + -1.0, + -1.0 + ], + "type": "VEC3" + }, + { + "bufferView": 1, + "byteOffset": 288, + "componentType": 5126, + "count": 24, + "max": [ + 0.5, + 0.5, + 0.5 + ], + "min": [ + -0.5, + -0.5, + -0.5 + ], + "type": "VEC3" + }, + { + "bufferView": 2, + "byteOffset": 0, + "componentType": 5126, + "count": 24, + "max": [ + 6.0, + 1.0 + ], + "min": [ + 0.0, + 0.0 + ], + "type": "VEC2" + } + ], + "materials": [ + { + "pbrMetallicRoughness": { + "baseColorTexture": { + "index": 0 + }, + "metallicFactor": 0.0 + }, + "name": "Texture" + } + ], + "textures": [ + { + "sampler": 0, + "source": 0 + } + ], + "images": [ + { + "uri": "" + } + ], + "samplers": [ + { + "magFilter": 9729, + "minFilter": 9986, + "wrapS": 10497, + "wrapT": 10497 + } + ], + "bufferViews": [ + { + "buffer": 0, + "byteOffset": 768, + "byteLength": 72, + "target": 34963 + }, + { + "buffer": 0, + "byteOffset": 0, + "byteLength": 576, + "byteStride": 12, + "target": 34962 + }, + { + "buffer": 0, + "byteOffset": 576, + "byteLength": 192, + "byteStride": 8, + "target": 34962 + } + ], + "buffers": [ + { + "byteLength": 840, + "uri": "data:application/octet-stream;base64,AAAAAAAAAAAAAIA/AAAAAAAAAAAAAIA/AAAAAAAAAAAAAIA/AAAAAAAAAAAAAIA/AACAPwAAAAAAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAAAAACAPwAAAAAAAAAAAAAAAAAAgD8AAAAAAAAAAAAAgD8AAAAAAAAAAAAAgD8AAAAAAAAAAAAAgD8AAAAAAAAAAAAAgL8AAAAAAAAAAAAAgL8AAAAAAAAAAAAAgL8AAAAAAAAAAAAAgL8AAAAAAACAvwAAAAAAAAAAAACAvwAAAAAAAAAAAACAvwAAAAAAAAAAAACAvwAAAAAAAAAAAAAAAAAAAAAAAIC/AAAAAAAAAAAAAIC/AAAAAAAAAAAAAIC/AAAAAAAAAAAAAIC/AAAAvwAAAL8AAAA/AAAAPwAAAL8AAAA/AAAAvwAAAD8AAAA/AAAAPwAAAD8AAAA/AAAAPwAAAD8AAAA/AAAAPwAAAL8AAAA/AAAAPwAAAD8AAAC/AAAAPwAAAL8AAAC/AAAAvwAAAD8AAAA/AAAAPwAAAD8AAAA/AAAAvwAAAD8AAAC/AAAAPwAAAD8AAAC/AAAAPwAAAL8AAAA/AAAAvwAAAL8AAAA/AAAAPwAAAL8AAAC/AAAAvwAAAL8AAAC/AAAAvwAAAL8AAAA/AAAAvwAAAD8AAAA/AAAAvwAAAL8AAAC/AAAAvwAAAD8AAAC/AAAAvwAAAL8AAAC/AAAAvwAAAD8AAAC/AAAAPwAAAL8AAAC/AAAAPwAAAD8AAAC/AADAQAAAAAAAAKBAAAAAAAAAwED+/38/AACgQP7/fz8AAIBAAAAAAAAAoEAAAAAAAACAQAAAgD8AAKBAAACAPwAAAEAAAAAAAACAPwAAAAAAAABAAACAPwAAgD8AAIA/AABAQAAAAAAAAIBAAAAAAAAAQEAAAIA/AACAQAAAgD8AAEBAAAAAAAAAAEAAAAAAAABAQAAAgD8AAABAAACAPwAAAAAAAAAAAAAAAP7/fz8AAIA/AAAAAAAAgD/+/38/AAABAAIAAwACAAEABAAFAAYABwAGAAUACAAJAAoACwAKAAkADAANAA4ADwAOAA0AEAARABIAEwASABEAFAAVABYAFwAWABUA" + } + ] +} diff --git a/gltfs/TextureTest.gltf b/gltfs/TextureTest.gltf new file mode 100644 index 0000000..891b3c5 --- /dev/null +++ b/gltfs/TextureTest.gltf @@ -0,0 +1,610 @@ +{ + "accessors" : [ + { + "bufferView" : 0, + "componentType" : 5121, + "count" : 6, + "max" : [ + 3 + ], + "min" : [ + 0 + ], + "type" : "SCALAR" + }, + { + "bufferView" : 1, + "componentType" : 5126, + "count" : 4, + "max" : [ + 1.2000000476837158, + 1.2000001668930054, + -5.205485820169997e-08 + ], + "min" : [ + 0.19999980926513672, + 0.20000004768371582, + -1.5933926533762133e-07 + ], + "type" : "VEC3" + }, + { + "bufferView" : 2, + "componentType" : 5126, + "count" : 4, + "max" : [ + -2.1316282072803006e-14, + 1.0728441424134871e-07, + 1.0 + ], + "min" : [ + -2.1316282072803006e-14, + 1.0728441424134871e-07, + 1.0 + ], + "type" : "VEC3" + }, + { + "name": "TopRight_TEXCOORD_0", + "bufferView" : 3, + "componentType" : 5126, + "count" : 4, + "max" : [ + 1.0, + 0.3999999761581421 + ], + "min" : [ + 0.6000000238418579, + 0.0 + ], + "type" : "VEC2" + }, + { + "bufferView" : 4, + "componentType" : 5121, + "count" : 6, + "max" : [ + 3 + ], + "min" : [ + 0 + ], + "type" : "SCALAR" + }, + { + "bufferView" : 5, + "componentType" : 5126, + "count" : 4, + "max" : [ + -0.20000006258487701, + 1.2000000476837158, + 1.2601539367551595e-07 + ], + "min" : [ + -1.2000001668930054, + 0.19999974966049194, + -3.3740951721483725e-07 + ], + "type" : "VEC3" + }, + { + "bufferView" : 6, + "componentType" : 5126, + "count" : 4, + "max" : [ + 1.7807019503379706e-07, + 2.853547016457014e-07, + 1.0 + ], + "min" : [ + 1.7807019503379706e-07, + 2.853547016457014e-07, + 1.0 + ], + "type" : "VEC3" + }, + { + "name": "TopLeft_TEXCOORD_0", + "bufferView" : 7, + "componentType" : 5126, + "count" : 4, + "max" : [ + 0.3999999463558197, + 0.3999999761581421 + ], + "min" : [ + 7.915305388905836e-08, + 0.0 + ], + "type" : "VEC2" + }, + { + "bufferView" : 8, + "componentType" : 5121, + "count" : 6, + "max" : [ + 3 + ], + "min" : [ + 0 + ], + "type" : "SCALAR" + }, + { + "bufferView" : 9, + "componentType" : 5126, + "count" : 4, + "max" : [ + 1.2000001668930054, + -0.1999996304512024, + 5.255118367131217e-07 + ], + "min" : [ + 0.2000000923871994, + -1.2000000476837158, + 6.20869826661874e-08 + ], + "type" : "VEC3" + }, + { + "bufferView" : 10, + "componentType" : 5126, + "count" : 4, + "max" : [ + -1.7807025187721592e-07, + 2.853545879588637e-07, + 1.0 + ], + "min" : [ + -1.7807025187721592e-07, + 2.853545879588637e-07, + 1.0 + ], + "type" : "VEC3" + }, + { + "name": "BottomRight_TEXCOORD_0", + "bufferView" : 11, + "componentType" : 5126, + "count" : 4, + "max" : [ + 0.9999998807907104, + 1.0 + ], + "min" : [ + 0.6000000834465027, + 0.599999874830246 + ], + "type" : "VEC2" + }, + { + "bufferView" : 12, + "componentType" : 5121, + "count" : 6, + "max" : [ + 3 + ], + "min" : [ + 0 + ], + "type" : "SCALAR" + }, + { + "bufferView" : 13, + "componentType" : 5126, + "count" : 4, + "max" : [ + 1.0000001192092896, + 1.0000001192092896, + -0.052591003477573395 + ], + "min" : [ + -1.0000001192092896, + -1.0000001192092896, + -0.05259115248918533 + ], + "type" : "VEC3" + }, + { + "bufferView" : 14, + "componentType" : 5126, + "count" : 4, + "max" : [ + 1.0658142730467397e-14, + -7.450581307466564e-08, + -1.0 + ], + "min" : [ + 1.0658142730467397e-14, + -7.450581307466564e-08, + -1.0 + ], + "type" : "VEC3" + }, + { + "bufferView" : 15, + "componentType" : 5121, + "count" : 6, + "max" : [ + 3 + ], + "min" : [ + 0 + ], + "type" : "SCALAR" + }, + { + "bufferView" : 16, + "componentType" : 5126, + "count" : 4, + "max" : [ + -0.19999969005584717, + -0.2000000774860382, + 5.255118367131217e-07 + ], + "min" : [ + -1.2000000476837158, + -1.2000001668930054, + 6.208701108789683e-08 + ], + "type" : "VEC3" + }, + { + "bufferView" : 17, + "componentType" : 5126, + "count" : 4, + "max" : [ + -8.526512829121202e-14, + 4.6342486825778906e-07, + 1.0 + ], + "min" : [ + -8.526512829121202e-14, + 4.6342486825778906e-07, + 1.0 + ], + "type" : "VEC3" + }, + { + "name": "BottomLeft_TEXCOORD_0", + "bufferView" : 18, + "componentType" : 5126, + "count" : 4, + "max" : [ + 0.40000009536743164, + 0.9999999208469248 + ], + "min" : [ + 0.0, + 0.6000000536441803 + ], + "type" : "VEC2" + } + ], + "asset" : { + "copyright" : "Copyright 2017-2018 Analytical Graphics, Inc., CC-BY 4.0 https://creativecommons.org/licenses/by/4.0/ - Mesh and textures by Ed Mackey.", + "generator" : "Khronos Blender glTF 2.0 exporter, plus hand-edits", + "version" : "2.0" + }, + "bufferViews" : [ + { + "buffer" : 0, + "byteLength" : 6, + "byteOffset" : 0, + "target" : 34963 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 8, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 56, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 32, + "byteOffset" : 104, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 6, + "byteOffset" : 136, + "target" : 34963 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 144, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 192, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 32, + "byteOffset" : 240, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 6, + "byteOffset" : 272, + "target" : 34963 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 280, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 328, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 32, + "byteOffset" : 376, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 6, + "byteOffset" : 408, + "target" : 34963 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 416, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 464, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 6, + "byteOffset" : 512, + "target" : 34963 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 520, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 48, + "byteOffset" : 568, + "target" : 34962 + }, + { + "buffer" : 0, + "byteLength" : 32, + "byteOffset" : 616, + "target" : 34962 + } + ], + "buffers" : [ + { + "byteLength" : 648, + "uri" : "data:application/octet-stream;base64,AAECAwEAAACamZk/2sxMPuySX7PAzEw+mZmZP9gWK7TKzEw+0MxMPuySX7OYmZk/m5mZP9gWK7QAAMCoO2TmMwAAgD8AAMCoO2TmMwAAgD8AAMCoO2TmMwAAgD8AAMCoO2TmMwAAgD8AAIA/zMzMPpqZGT8AAAAAmpkZP8zMzD4AAIA/AAAAAAABAgMBAAAA0cxMvsnMTD7okl+zm5mZv5iZmT/aFiu0mZmZv7zMTD7ZTgc03sxMvpqZmT82JbW0kDM/NNoymTQAAIA/kDM/NNoymTQAAIA/kDM/NNoymTQAAIA/kDM/NNoymTQAAIA/y8zMPszMzD7a+qkzAAAAANr6qTPMzMw+y8zMPgAAAAAAAQIDAQAAAJuZmT+YmZm/5hANNdPMTD7HzEy+rlSFM+bMTD6amZm/Boi6NJmZmT+0zEy+8u6ANJQzP7TWMpk0AACAP5QzP7TWMpk0AACAP5QzP7TWMpk0AACAP5QzP7TWMpk0AACAP/3/fz8AAIA/m5kZP5iZGT+cmRk/AACAP/7/fz+YmRk/AAECAAMBAAABAIC//f9/P9JpV70BAIA//f9/v6ppV739/3+/AQCAv6ppV739/38/AQCAP9JpV70CAEAoAQCgswAAgL8CAEAoAQCgswAAgL8CAEAoAQCgswAAgL8CAEAoAQCgswAAgL8AAQIDAQAAALjMTL6ZmZm/5hANNZqZmb/izEy+uFSFM5iZmb+bmZm/5hANNcjMTL7SzEy+slSFMwAAwKmhzPg0AACAPwAAwKmhzPg0AACAPwAAwKmhzPg0AACAPwAAwKmhzPg0AACAP9DMzD7//38/AACAM5qZGT8AAAAA//9/P83MzD6amRk/" + } + ], + "images" : [ + { + "uri" : "" + } + ], + "materials" : [ + { + "name" : "BackPlaneMat", + "doubleSided": true, + "pbrMetallicRoughness" : { + "baseColorFactor" : [ + 0.16000001668930075, + 0.16000001668930075, + 0.16000001668930075, + 1.0 + ], + "metallicFactor" : 0.0 + } + }, + { + "name" : "BottomLeftMat", + "doubleSided": true, + "pbrMetallicRoughness" : { + "baseColorFactor" : [ + 0.0, + 0.16000000476837162, + 0.800000011920929, + 1.0 + ], + "baseColorTexture" : { + "index" : 0 + }, + "metallicFactor" : 0.0 + } + }, + { + "name" : "BottomRightMat", + "doubleSided": true, + "pbrMetallicRoughness" : { + "baseColorFactor" : [ + 0.0, + 0.800000011920929, + 0.0, + 1.0 + ], + "baseColorTexture" : { + "index" : 0 + }, + "metallicFactor" : 0.0 + } + }, + { + "name" : "TopLeftMat", + "doubleSided": true, + "pbrMetallicRoughness" : { + "baseColorFactor" : [ + 0.800000011920929, + 0.800000011920929, + 0.0, + 1.0 + ], + "baseColorTexture" : { + "index" : 0 + }, + "metallicFactor" : 0.0 + } + }, + { + "name" : "TopRightMat", + "doubleSided": true, + "pbrMetallicRoughness" : { + "baseColorFactor" : [ + 0.800000011920929, + 0.08000000238418581, + 0.0, + 1.0 + ], + "baseColorTexture" : { + "index" : 0 + }, + "metallicFactor" : 0.0 + } + } + ], + "meshes" : [ + { + "name" : "TopRightMesh", + "primitives" : [ + { + "attributes" : { + "NORMAL" : 2, + "POSITION" : 1, + "TEXCOORD_0" : 3 + }, + "indices" : 0, + "material" : 4 + } + ] + }, + { + "name" : "TopLeftMesh", + "primitives" : [ + { + "attributes" : { + "NORMAL" : 6, + "POSITION" : 5, + "TEXCOORD_0" : 7 + }, + "indices" : 4, + "material" : 3 + } + ] + }, + { + "name" : "BottomRightMesh", + "primitives" : [ + { + "attributes" : { + "NORMAL" : 10, + "POSITION" : 9, + "TEXCOORD_0" : 11 + }, + "indices" : 8, + "material" : 2 + } + ] + }, + { + "name" : "BackPlaneMesh", + "primitives" : [ + { + "attributes" : { + "NORMAL" : 14, + "POSITION" : 13 + }, + "indices" : 12, + "material" : 0 + } + ] + }, + { + "name" : "BottomLeftMesh", + "primitives" : [ + { + "attributes" : { + "NORMAL" : 17, + "POSITION" : 16, + "TEXCOORD_0" : 18 + }, + "indices" : 15, + "material" : 1 + } + ] + } + ], + "nodes" : [ + { + "mesh" : 3, + "name" : "BackPlane" + }, + { + "mesh" : 4, + "name" : "BottomLeftObj" + }, + { + "mesh" : 2, + "name" : "BottomRightObj" + }, + { + "mesh" : 1, + "name" : "TopLeftObj" + }, + { + "mesh" : 0, + "name" : "TopRightObj" + } + ], + "samplers" : [ + {} + ], + "scene" : 0, + "scenes" : [ + { + "name" : "Scene", + "nodes" : [ + 0, + 2, + 1, + 4, + 3 + ] + } + ], + "textures" : [ + { + "sampler" : 0, + "source" : 0 + } + ] +} diff --git a/renders/color.PNG b/renders/color.PNG new file mode 100644 index 0000000..35100d9 Binary files /dev/null and b/renders/color.PNG differ diff --git a/renders/depth.PNG b/renders/depth.PNG new file mode 100644 index 0000000..5e22ccf Binary files /dev/null and b/renders/depth.PNG differ diff --git a/renders/instancing.PNG b/renders/instancing.PNG new file mode 100644 index 0000000..e0e3c20 Binary files /dev/null and b/renders/instancing.PNG differ diff --git a/renders/lines.PNG b/renders/lines.PNG new file mode 100644 index 0000000..75a8824 Binary files /dev/null and b/renders/lines.PNG differ diff --git a/renders/lines_2.PNG b/renders/lines_2.PNG new file mode 100644 index 0000000..347704e Binary files /dev/null and b/renders/lines_2.PNG differ diff --git a/renders/normals.PNG b/renders/normals.PNG new file mode 100644 index 0000000..90656da Binary files /dev/null and b/renders/normals.PNG differ diff --git a/renders/performance_analysis.png b/renders/performance_analysis.png new file mode 100644 index 0000000..a002e15 Binary files /dev/null and b/renders/performance_analysis.png differ diff --git a/renders/pipeline.png b/renders/pipeline.png new file mode 100644 index 0000000..b55787c Binary files /dev/null and b/renders/pipeline.png differ diff --git a/renders/points.PNG b/renders/points.PNG new file mode 100644 index 0000000..d5fda06 Binary files /dev/null and b/renders/points.PNG differ diff --git a/renders/render_cesium.PNG b/renders/render_cesium.PNG new file mode 100644 index 0000000..518a296 Binary files /dev/null and b/renders/render_cesium.PNG differ diff --git a/renders/render_di.PNG b/renders/render_di.PNG new file mode 100644 index 0000000..65af634 Binary files /dev/null and b/renders/render_di.PNG differ diff --git a/renders/render_duck.PNG b/renders/render_duck.PNG new file mode 100644 index 0000000..706aa54 Binary files /dev/null and b/renders/render_duck.PNG differ diff --git a/renders/ssaa_1.PNG b/renders/ssaa_1.PNG new file mode 100644 index 0000000..d11a59b Binary files /dev/null and b/renders/ssaa_1.PNG differ diff --git a/renders/ssaa_2.PNG b/renders/ssaa_2.PNG new file mode 100644 index 0000000..b6c5339 Binary files /dev/null and b/renders/ssaa_2.PNG differ diff --git a/renders/triangles.PNG b/renders/triangles.PNG new file mode 100644 index 0000000..28774a4 Binary files /dev/null and b/renders/triangles.PNG differ diff --git a/renders/triangles_2.PNG b/renders/triangles_2.PNG new file mode 100644 index 0000000..1cb1e72 Binary files /dev/null and b/renders/triangles_2.PNG differ diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index a57f69f..779ca22 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -1,4 +1,6 @@ set(SOURCE_FILES + "CameraControls.h" + "CameraControls.cpp" "rasterize.cu" "rasterize.h" "rasterizeTools.h" @@ -6,5 +8,5 @@ set(SOURCE_FILES cuda_add_library(src ${SOURCE_FILES} - OPTIONS -arch=sm_20 + OPTIONS -arch=sm_60 ) diff --git a/src/main.cpp b/src/main.cpp index 7986959..cb4fae1 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -14,6 +14,7 @@ #define TINYGLTF_LOADER_IMPLEMENTATION #include + //------------------------------- //-------------MAIN-------------- //------------------------------- @@ -45,11 +46,12 @@ int main(int argc, char **argv) { } if (!ret) { + printf("Failed to parse glTF\n"); + getchar(); return -1; } - frame = 0; seconds = time(NULL); fpstracker = 0; @@ -116,12 +118,16 @@ void runCuda() { * glm::rotate(y_angle, glm::vec3(0.0f, 1.0f, 0.0f)); glm::mat3 MV_normal = glm::transpose(glm::inverse(glm::mat3(V) * glm::mat3(M))); + glm::mat3 M_inverseTranspose = glm::inverse(glm::transpose(glm::mat3(M))); glm::mat4 MV = V * M; glm::mat4 MVP = P * MV; - cudaGLMapBufferObject((void **)&dptr, pbo); - rasterize(dptr, MVP, MV, MV_normal); - cudaGLUnmapBufferObject(pbo); + + glm::vec3 eyePos = glm::vec3(x_trans, y_trans, z_trans); + + cudaGLMapBufferObject((void **)&dptr, pbo); + rasterize(dptr, MVP, MV, MV_normal, M_inverseTranspose, eyePos); + cudaGLUnmapBufferObject(pbo); frame++; fpstracker++; diff --git a/src/rasterize.cu b/src/rasterize.cu index 1262a09..c9907ce 100644 --- a/src/rasterize.cu +++ b/src/rasterize.cu @@ -20,101 +20,129 @@ namespace { - typedef unsigned short VertexIndex; - typedef glm::vec3 VertexAttributePosition; - typedef glm::vec3 VertexAttributeNormal; - typedef glm::vec2 VertexAttributeTexcoord; - typedef unsigned char TextureData; - - typedef unsigned char BufferByte; - - enum PrimitiveType{ - Point = 1, - Line = 2, - Triangle = 3 - }; - - struct VertexOut { - glm::vec4 pos; - - // TODO: add new attributes to your VertexOut - // The attributes listed below might be useful, - // but always feel free to modify on your own - - glm::vec3 eyePos; // eye space position used for shading - glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation - // glm::vec3 col; - glm::vec2 texcoord0; - TextureData* dev_diffuseTex = NULL; - // int texWidth, texHeight; - // ... - }; - - struct Primitive { - PrimitiveType primitiveType = Triangle; // C++ 11 init - VertexOut v[3]; - }; - - struct Fragment { - glm::vec3 color; - - // TODO: add new attributes to your Fragment - // The attributes listed below might be useful, - // but always feel free to modify on your own - - // glm::vec3 eyePos; // eye space position used for shading - // glm::vec3 eyeNor; - // VertexAttributeTexcoord texcoord0; - // TextureData* dev_diffuseTex; - // ... - }; - - struct PrimitiveDevBufPointers { - int primitiveMode; //from tinygltfloader macro - PrimitiveType primitiveType; - int numPrimitives; - int numIndices; - int numVertices; - - // Vertex In, const after loaded - VertexIndex* dev_indices; - VertexAttributePosition* dev_position; - VertexAttributeNormal* dev_normal; - VertexAttributeTexcoord* dev_texcoord0; - - // Materials, add more attributes when needed - TextureData* dev_diffuseTex; - int diffuseTexWidth; - int diffuseTexHeight; - // TextureData* dev_specularTex; - // TextureData* dev_normalTex; - // ... - - // Vertex Out, vertex used for rasterization, this is changing every frame - VertexOut* dev_verticesOut; - - // TODO: add more attributes when needed - }; + typedef unsigned short VertexIndex; + typedef glm::vec3 VertexAttributePosition; + typedef glm::vec3 VertexAttributeNormal; + typedef glm::vec2 VertexAttributeTexcoord; + typedef unsigned char TextureData; + + typedef unsigned char BufferByte; + +#define BYTES_PER_PIXEL 3 + +#define NUM_INSTANCES 1 + +#define AA_SUPER_SAMPLE + //#define AA_MULTI_SAMPLE + +#define SSAA_LEVEL 1 +#define MSAA_LEVEL 2 + +#define NO_INTERPOLATION +#define BI_LINEAR_INTERPOLATION + + enum PrimitiveType + { + Point = 1, + Line = 2, + Triangle = 3 + }; + + struct VertexOut + { + glm::vec4 pos; + glm::vec3 eyePos; // eye space position used for shading + glm::vec3 eyeNor; // eye space normal used for shading, cuz normal will go wrong after perspective transformation + + glm::vec3 color; + glm::vec2 texcoord0; + }; + + struct Primitive + { + int instanceId; + PrimitiveType primitiveType = Triangle; // C++ 11 init + VertexOut v[3]; + TextureData* dev_diffuseTex; + int diffuseTexWidth; + int diffuseTexHeight; + }; + + struct Fragment + { + glm::vec3 color; + glm::vec3 eyePos; + glm::vec3 eyeNor; + VertexAttributeTexcoord texcoord0; + TextureData* dev_diffuseTex; + int diffuseTexWidth; + int diffuseTexHeight; + }; + + struct PrimitiveDevBufPointers + { + int primitiveMode; //from tinygltfloader macro + + PrimitiveType primitiveType; + int numPrimitives; + int numIndices; + int numVertices; + int numInstances; + + int vertexOutStartIndex; + + + // Vertex In, const after loaded + VertexIndex* dev_indices; + VertexAttributePosition* dev_position; + VertexAttributeNormal* dev_normal; + VertexAttributeTexcoord* dev_texcoord0; + + // Materials, add more attributes when needed + TextureData* dev_diffuseTex; + int diffuseTexWidth; + int diffuseTexHeight; + // TextureData* dev_specularTex; + // TextureData* dev_normalTex; + // ... + + // Vertex Out, vertex used for rasterization, this is changing every frame + VertexOut* dev_verticesOut; + + // TODO: add more attributes when needed + }; } static std::map> mesh2PrimitivesMap; +static int ssaaWidth = 0; +static int ssaaHeight = 0; + +static int msaaWidth = 0; +static int msaaHeight = 0; static int width = 0; static int height = 0; +static int originalWidth = 0; +static int originalHeight = 0; + static int totalNumPrimitives = 0; static Primitive *dev_primitives = NULL; + static Fragment *dev_fragmentBuffer = NULL; + static glm::vec3 *dev_framebuffer = NULL; +static glm::vec3 *dev_AAFrameBuffer = NULL; -static int * dev_depth = NULL; // you might need this buffer when doing depth test +static float * dev_depth = NULL; // you might need this buffer when doing depth test +static unsigned int* dev_mutex = NULL; /** * Kernel that writes the image to the OpenGL PBO directly. */ -__global__ +__global__ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { int x = (blockIdx.x * blockDim.x) + threadIdx.x; int y = (blockIdx.y * blockDim.y) + threadIdx.y; @@ -133,53 +161,65 @@ void sendImageToPBO(uchar4 *pbo, int w, int h, glm::vec3 *image) { } } -/** -* Writes fragment colors to the framebuffer -*/ -__global__ -void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) { - int x = (blockIdx.x * blockDim.x) + threadIdx.x; - int y = (blockIdx.y * blockDim.y) + threadIdx.y; - int index = x + (y * w); - if (x < w && y < h) { - framebuffer[index] = fragmentBuffer[index].color; - - // TODO: add your fragment shader code here - - } -} /** * Called once at the beginning of the program to allocate memory. */ -void rasterizeInit(int w, int h) { - width = w; - height = h; - cudaFree(dev_fragmentBuffer); - cudaMalloc(&dev_fragmentBuffer, width * height * sizeof(Fragment)); - cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); +void rasterizeInit(int w, int h) +{ + originalWidth = w; + originalHeight = h; + + ssaaWidth = w * SSAA_LEVEL; + ssaaHeight = h * SSAA_LEVEL; + + msaaWidth = w * MSAA_LEVEL; + msaaHeight = h * MSAA_LEVEL; + +#ifdef AA_SUPER_SAMPLE + width = ssaaWidth; + height = ssaaHeight; +#endif // AA_SUPER_SAMPLE + +#ifdef AA_MULTI_SAMPLE + width = msaaWidth; + height = msaaHeight; +#endif // AA_MULTI_SAMPLE + + cudaFree(dev_fragmentBuffer); + cudaMalloc(&dev_fragmentBuffer, width * height * sizeof(Fragment)); + cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); + cudaFree(dev_framebuffer); - cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); + cudaMalloc(&dev_framebuffer, width * height * sizeof(glm::vec3)); cudaMemset(dev_framebuffer, 0, width * height * sizeof(glm::vec3)); - - cudaFree(dev_depth); - cudaMalloc(&dev_depth, width * height * sizeof(int)); - checkCUDAError("rasterizeInit"); + cudaFree(dev_AAFrameBuffer); + cudaMalloc(&dev_AAFrameBuffer, originalWidth * originalHeight * sizeof(glm::vec3)); + cudaMemset(dev_AAFrameBuffer, 0, originalWidth * originalHeight * sizeof(glm::vec3)); + + cudaFree(dev_depth); + cudaMalloc(&dev_depth, width * height * sizeof(float)); + + cudaFree(dev_mutex); + cudaMalloc(&dev_mutex, width * height * sizeof(unsigned int)); + cudaMemset(dev_mutex, 0, width * height * sizeof(unsigned int)); + + checkCUDAError("rasterizeInit"); } __global__ -void initDepth(int w, int h, int * depth) +void initDepth(int w, int h, float * depth) { - int x = (blockIdx.x * blockDim.x) + threadIdx.x; - int y = (blockIdx.y * blockDim.y) + threadIdx.y; - - if (x < w && y < h) - { - int index = x + (y * w); - depth[index] = INT_MAX; - } + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + + if (x < w && y < h) + { + int index = x + (y * w); + depth[index] = FLT_MAX; + } } @@ -187,550 +227,1098 @@ void initDepth(int w, int h, int * depth) * kern function with support for stride to sometimes replace cudaMemcpy * One thread is responsible for copying one component */ -__global__ +__global__ void _deviceBufferCopy(int N, BufferByte* dev_dst, const BufferByte* dev_src, int n, int byteStride, int byteOffset, int componentTypeByteSize) { - - // Attribute (vec3 position) - // component (3 * float) - // byte (4 * byte) - - // id of component - int i = (blockIdx.x * blockDim.x) + threadIdx.x; - - if (i < N) { - int count = i / n; - int offset = i - count * n; // which component of the attribute - - for (int j = 0; j < componentTypeByteSize; j++) { - - dev_dst[count * componentTypeByteSize * n - + offset * componentTypeByteSize - + j] - - = - - dev_src[byteOffset - + count * (byteStride == 0 ? componentTypeByteSize * n : byteStride) - + offset * componentTypeByteSize - + j]; - } - } - + + // Attribute (vec3 position) + // component (3 * float) + // byte (4 * byte) + + // id of component + int i = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (i < N) { + int count = i / n; + int offset = i - count * n; // which component of the attribute + + for (int j = 0; j < componentTypeByteSize; j++) { + + dev_dst[count * componentTypeByteSize * n + + offset * componentTypeByteSize + + j] + + = + + dev_src[byteOffset + + count * (byteStride == 0 ? componentTypeByteSize * n : byteStride) + + offset * componentTypeByteSize + + j]; + } + } + } __global__ void _nodeMatrixTransform( - int numVertices, - VertexAttributePosition* position, - VertexAttributeNormal* normal, - glm::mat4 MV, glm::mat3 MV_normal) { - - // vertex id - int vid = (blockIdx.x * blockDim.x) + threadIdx.x; - if (vid < numVertices) { - position[vid] = glm::vec3(MV * glm::vec4(position[vid], 1.0f)); - normal[vid] = glm::normalize(MV_normal * normal[vid]); - } + int numVertices, + VertexAttributePosition* position, + VertexAttributeNormal* normal, + glm::mat4 MV, glm::mat3 MV_normal) { + + // vertex id + int vid = (blockIdx.x * blockDim.x) + threadIdx.x; + if (vid < numVertices) { + position[vid] = glm::vec3(MV * glm::vec4(position[vid], 1.0f)); + normal[vid] = glm::normalize(MV_normal * normal[vid]); + } } glm::mat4 getMatrixFromNodeMatrixVector(const tinygltf::Node & n) { - - glm::mat4 curMatrix(1.0); - - const std::vector &m = n.matrix; - if (m.size() > 0) { - // matrix, copy it - - for (int i = 0; i < 4; i++) { - for (int j = 0; j < 4; j++) { - curMatrix[i][j] = (float)m.at(4 * i + j); - } - } - } else { - // no matrix, use rotation, scale, translation - - if (n.translation.size() > 0) { - curMatrix[3][0] = n.translation[0]; - curMatrix[3][1] = n.translation[1]; - curMatrix[3][2] = n.translation[2]; - } - - if (n.rotation.size() > 0) { - glm::mat4 R; - glm::quat q; - q[0] = n.rotation[0]; - q[1] = n.rotation[1]; - q[2] = n.rotation[2]; - - R = glm::mat4_cast(q); - curMatrix = curMatrix * R; - } - - if (n.scale.size() > 0) { - curMatrix = curMatrix * glm::scale(glm::vec3(n.scale[0], n.scale[1], n.scale[2])); - } - } - - return curMatrix; + + glm::mat4 curMatrix(1.0); + + const std::vector &m = n.matrix; + if (m.size() > 0) { + // matrix, copy it + + for (int i = 0; i < 4; i++) { + for (int j = 0; j < 4; j++) { + curMatrix[i][j] = (float)m.at(4 * i + j); + } + } + } + else { + // no matrix, use rotation, scale, translation + + if (n.translation.size() > 0) { + curMatrix[3][0] = n.translation[0]; + curMatrix[3][1] = n.translation[1]; + curMatrix[3][2] = n.translation[2]; + } + + if (n.rotation.size() > 0) { + glm::mat4 R; + glm::quat q; + q[0] = n.rotation[0]; + q[1] = n.rotation[1]; + q[2] = n.rotation[2]; + + R = glm::mat4_cast(q); + curMatrix = curMatrix * R; + } + + if (n.scale.size() > 0) { + curMatrix = curMatrix * glm::scale(glm::vec3(n.scale[0], n.scale[1], n.scale[2])); + } + } + + return curMatrix; } -void traverseNode ( - std::map & n2m, - const tinygltf::Scene & scene, - const std::string & nodeString, - const glm::mat4 & parentMatrix - ) +void traverseNode( + std::map & n2m, + const tinygltf::Scene & scene, + const std::string & nodeString, + const glm::mat4 & parentMatrix +) { - const tinygltf::Node & n = scene.nodes.at(nodeString); - glm::mat4 M = parentMatrix * getMatrixFromNodeMatrixVector(n); - n2m.insert(std::pair(nodeString, M)); + const tinygltf::Node & n = scene.nodes.at(nodeString); + glm::mat4 M = parentMatrix * getMatrixFromNodeMatrixVector(n); + n2m.insert(std::pair(nodeString, M)); - auto it = n.children.begin(); - auto itEnd = n.children.end(); + auto it = n.children.begin(); + auto itEnd = n.children.end(); - for (; it != itEnd; ++it) { - traverseNode(n2m, scene, *it, M); - } + for (; it != itEnd; ++it) { + traverseNode(n2m, scene, *it, M); + } } void rasterizeSetBuffers(const tinygltf::Scene & scene) { - totalNumPrimitives = 0; + totalNumPrimitives = 0; - std::map bufferViewDevPointers; + std::map bufferViewDevPointers; - // 1. copy all `bufferViews` to device memory - { - std::map::const_iterator it( - scene.bufferViews.begin()); - std::map::const_iterator itEnd( - scene.bufferViews.end()); + // 1. copy all `bufferViews` to device memory + { + std::map::const_iterator it( + scene.bufferViews.begin()); + std::map::const_iterator itEnd( + scene.bufferViews.end()); - for (; it != itEnd; it++) { - const std::string key = it->first; - const tinygltf::BufferView &bufferView = it->second; - if (bufferView.target == 0) { - continue; // Unsupported bufferView. - } + for (; it != itEnd; it++) { + const std::string key = it->first; + const tinygltf::BufferView &bufferView = it->second; + if (bufferView.target == 0) { + continue; // Unsupported bufferView. + } - const tinygltf::Buffer &buffer = scene.buffers.at(bufferView.buffer); + const tinygltf::Buffer &buffer = scene.buffers.at(bufferView.buffer); - BufferByte* dev_bufferView; - cudaMalloc(&dev_bufferView, bufferView.byteLength); - cudaMemcpy(dev_bufferView, &buffer.data.front() + bufferView.byteOffset, bufferView.byteLength, cudaMemcpyHostToDevice); + BufferByte* dev_bufferView; + cudaMalloc(&dev_bufferView, bufferView.byteLength); + cudaMemcpy(dev_bufferView, &buffer.data.front() + bufferView.byteOffset, bufferView.byteLength, cudaMemcpyHostToDevice); - checkCUDAError("Set BufferView Device Mem"); + checkCUDAError("Set BufferView Device Mem"); - bufferViewDevPointers.insert(std::make_pair(key, dev_bufferView)); + bufferViewDevPointers.insert(std::make_pair(key, dev_bufferView)); - } - } + } + } + // 2. for each mesh: + // for each primitive: + // build device buffer of indices, materail, and each attributes + // and store these pointers in a map + { + + std::map nodeString2Matrix; + auto rootNodeNamesList = scene.scenes.at(scene.defaultScene); + + { + auto it = rootNodeNamesList.begin(); + auto itEnd = rootNodeNamesList.end(); + for (; it != itEnd; ++it) { + traverseNode(nodeString2Matrix, scene, *it, glm::mat4(1.0f)); + } + } + + // parse through node to access mesh + + auto itNode = nodeString2Matrix.begin(); + auto itEndNode = nodeString2Matrix.end(); + for (; itNode != itEndNode; ++itNode) { + + const tinygltf::Node & N = scene.nodes.at(itNode->first); + const glm::mat4 & matrix = itNode->second; + const glm::mat3 & matrixNormal = glm::transpose(glm::inverse(glm::mat3(matrix))); + + auto itMeshName = N.meshes.begin(); + auto itEndMeshName = N.meshes.end(); + + for (; itMeshName != itEndMeshName; ++itMeshName) { + + const tinygltf::Mesh & mesh = scene.meshes.at(*itMeshName); + + auto res = mesh2PrimitivesMap.insert(std::pair>(mesh.name, std::vector())); + std::vector & primitiveVector = (res.first)->second; + + // for each primitive + for (size_t i = 0; i < mesh.primitives.size(); i++) { + const tinygltf::Primitive &primitive = mesh.primitives[i]; + + if (primitive.indices.empty()) + return; + + // TODO: add new attributes for your PrimitiveDevBufPointers when you add new attributes + VertexIndex* dev_indices = NULL; + VertexAttributePosition* dev_position = NULL; + VertexAttributeNormal* dev_normal = NULL; + VertexAttributeTexcoord* dev_texcoord0 = NULL; + + // ----------Indices------------- + + const tinygltf::Accessor &indexAccessor = scene.accessors.at(primitive.indices); + const tinygltf::BufferView &bufferView = scene.bufferViews.at(indexAccessor.bufferView); + BufferByte* dev_bufferView = bufferViewDevPointers.at(indexAccessor.bufferView); + + // assume type is SCALAR for indices + int n = 1; + int numIndices = indexAccessor.count; + int componentTypeByteSize = sizeof(VertexIndex); + int byteLength = numIndices * n * componentTypeByteSize; + + dim3 numThreadsPerBlock(128); + dim3 numBlocks((numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + cudaMalloc(&dev_indices, byteLength); + _deviceBufferCopy << > > ( + numIndices, + (BufferByte*)dev_indices, + dev_bufferView, + n, + indexAccessor.byteStride, + indexAccessor.byteOffset, + componentTypeByteSize); + + + checkCUDAError("Set Index Buffer"); + + + // ---------Primitive Info------- + + // Warning: LINE_STRIP is not supported in tinygltfloader + int numPrimitives; + PrimitiveType primitiveType; + switch (primitive.mode) { + case TINYGLTF_MODE_TRIANGLES: + primitiveType = PrimitiveType::Triangle; + numPrimitives = numIndices / 3; + break; + case TINYGLTF_MODE_TRIANGLE_STRIP: + primitiveType = PrimitiveType::Triangle; + numPrimitives = numIndices - 2; + break; + case TINYGLTF_MODE_TRIANGLE_FAN: + primitiveType = PrimitiveType::Triangle; + numPrimitives = numIndices - 2; + break; + case TINYGLTF_MODE_LINE: + primitiveType = PrimitiveType::Line; + numPrimitives = numIndices / 2; + break; + case TINYGLTF_MODE_LINE_LOOP: + primitiveType = PrimitiveType::Line; + numPrimitives = numIndices + 1; + break; + case TINYGLTF_MODE_POINTS: + primitiveType = PrimitiveType::Point; + numPrimitives = numIndices; + break; + default: + // output error + break; + }; + + + // ----------Attributes------------- + + auto it(primitive.attributes.begin()); + auto itEnd(primitive.attributes.end()); + + int numInstances = NUM_INSTANCES; + int numVertices = 0; + // for each attribute + for (; it != itEnd; it++) { + const tinygltf::Accessor &accessor = scene.accessors.at(it->second); + const tinygltf::BufferView &bufferView = scene.bufferViews.at(accessor.bufferView); + + int n = 1; + if (accessor.type == TINYGLTF_TYPE_SCALAR) { + n = 1; + } + else if (accessor.type == TINYGLTF_TYPE_VEC2) { + n = 2; + } + else if (accessor.type == TINYGLTF_TYPE_VEC3) { + n = 3; + } + else if (accessor.type == TINYGLTF_TYPE_VEC4) { + n = 4; + } + + BufferByte * dev_bufferView = bufferViewDevPointers.at(accessor.bufferView); + BufferByte ** dev_attribute = NULL; + + numVertices = accessor.count; + int componentTypeByteSize; + + // Note: since the type of our attribute array (dev_position) is static (float32) + // We assume the glTF model attribute type are 5126(FLOAT) here + + if (it->first.compare("POSITION") == 0) { + componentTypeByteSize = sizeof(VertexAttributePosition) / n; + dev_attribute = (BufferByte**)&dev_position; + } + else if (it->first.compare("NORMAL") == 0) { + componentTypeByteSize = sizeof(VertexAttributeNormal) / n; + dev_attribute = (BufferByte**)&dev_normal; + } + else if (it->first.compare("TEXCOORD_0") == 0) { + componentTypeByteSize = sizeof(VertexAttributeTexcoord) / n; + dev_attribute = (BufferByte**)&dev_texcoord0; + } + + std::cout << accessor.bufferView << " - " << it->second << " - " << it->first << '\n'; + + dim3 numThreadsPerBlock(128); + dim3 numBlocks((n * numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + int byteLength = numVertices * n * componentTypeByteSize; + cudaMalloc(dev_attribute, byteLength); + + _deviceBufferCopy << > > ( + n * numVertices, + *dev_attribute, + dev_bufferView, + n, + accessor.byteStride, + accessor.byteOffset, + componentTypeByteSize); + + std::string msg = "Set Attribute Buffer: " + it->first; + checkCUDAError(msg.c_str()); + } + + // malloc for VertexOut + VertexOut* dev_vertexOut; + cudaMalloc(&dev_vertexOut, numVertices * numInstances * sizeof(VertexOut)); + checkCUDAError("Malloc VertexOut Buffer"); + + // ----------Materials------------- + + // You can only worry about this part once you started to + // implement textures for your rasterizer + TextureData* dev_diffuseTex = NULL; + int diffuseTexWidth = 0; + int diffuseTexHeight = 0; + if (!primitive.material.empty()) { + const tinygltf::Material &mat = scene.materials.at(primitive.material); + printf("material.name = %s\n", mat.name.c_str()); + + if (mat.values.find("diffuse") != mat.values.end()) { + std::string diffuseTexName = mat.values.at("diffuse").string_value; + printf("Diffuse Texture name = %s\n", diffuseTexName.c_str()); + if (scene.textures.find(diffuseTexName) != scene.textures.end()) + { + const tinygltf::Texture &tex = scene.textures.at(diffuseTexName); + if (scene.images.find(tex.source) != scene.images.end()) { + const tinygltf::Image &image = scene.images.at(tex.source); + + size_t s = image.image.size() * sizeof(TextureData); + cudaMalloc(&dev_diffuseTex, s); + cudaMemcpy(dev_diffuseTex, &image.image.at(0), s, cudaMemcpyHostToDevice); + + diffuseTexWidth = image.width; + diffuseTexHeight = image.height; + + checkCUDAError("Set Texture Image data"); + } + } + + } + + // TODO: write your code for other materails + // You may have to take a look at tinygltfloader + // You can also use the above code loading diffuse material as a start point + } + + + // ---------Node hierarchy transform-------- + cudaDeviceSynchronize(); + + dim3 numBlocksNodeTransform((numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + _nodeMatrixTransform << > > ( + numVertices, + dev_position, + dev_normal, + matrix, + matrixNormal); + + checkCUDAError("Node hierarchy transformation"); + + // at the end of the for loop of primitive + // push dev pointers to map + primitiveVector.push_back(PrimitiveDevBufPointers{ + primitive.mode, + primitiveType, + numPrimitives, + numIndices, + numVertices, + numInstances, + + 0, + + dev_indices, + dev_position, + dev_normal, + dev_texcoord0, + + dev_diffuseTex, + diffuseTexWidth, + diffuseTexHeight, + + dev_vertexOut //VertexOut + }); + + totalNumPrimitives += (numPrimitives * numInstances); + + } // for each primitive + + } // for each mesh + } // for each node - // 2. for each mesh: - // for each primitive: - // build device buffer of indices, materail, and each attributes - // and store these pointers in a map - { - - std::map nodeString2Matrix; - auto rootNodeNamesList = scene.scenes.at(scene.defaultScene); - - { - auto it = rootNodeNamesList.begin(); - auto itEnd = rootNodeNamesList.end(); - for (; it != itEnd; ++it) { - traverseNode(nodeString2Matrix, scene, *it, glm::mat4(1.0f)); - } - } + } - // parse through node to access mesh - - auto itNode = nodeString2Matrix.begin(); - auto itEndNode = nodeString2Matrix.end(); - for (; itNode != itEndNode; ++itNode) { - - const tinygltf::Node & N = scene.nodes.at(itNode->first); - const glm::mat4 & matrix = itNode->second; - const glm::mat3 & matrixNormal = glm::transpose(glm::inverse(glm::mat3(matrix))); - - auto itMeshName = N.meshes.begin(); - auto itEndMeshName = N.meshes.end(); - - for (; itMeshName != itEndMeshName; ++itMeshName) { - - const tinygltf::Mesh & mesh = scene.meshes.at(*itMeshName); - - auto res = mesh2PrimitivesMap.insert(std::pair>(mesh.name, std::vector())); - std::vector & primitiveVector = (res.first)->second; - - // for each primitive - for (size_t i = 0; i < mesh.primitives.size(); i++) { - const tinygltf::Primitive &primitive = mesh.primitives[i]; - - if (primitive.indices.empty()) - return; - - // TODO: add new attributes for your PrimitiveDevBufPointers when you add new attributes - VertexIndex* dev_indices = NULL; - VertexAttributePosition* dev_position = NULL; - VertexAttributeNormal* dev_normal = NULL; - VertexAttributeTexcoord* dev_texcoord0 = NULL; - - // ----------Indices------------- - - const tinygltf::Accessor &indexAccessor = scene.accessors.at(primitive.indices); - const tinygltf::BufferView &bufferView = scene.bufferViews.at(indexAccessor.bufferView); - BufferByte* dev_bufferView = bufferViewDevPointers.at(indexAccessor.bufferView); - - // assume type is SCALAR for indices - int n = 1; - int numIndices = indexAccessor.count; - int componentTypeByteSize = sizeof(VertexIndex); - int byteLength = numIndices * n * componentTypeByteSize; - - dim3 numThreadsPerBlock(128); - dim3 numBlocks((numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - cudaMalloc(&dev_indices, byteLength); - _deviceBufferCopy << > > ( - numIndices, - (BufferByte*)dev_indices, - dev_bufferView, - n, - indexAccessor.byteStride, - indexAccessor.byteOffset, - componentTypeByteSize); - - - checkCUDAError("Set Index Buffer"); - - - // ---------Primitive Info------- - - // Warning: LINE_STRIP is not supported in tinygltfloader - int numPrimitives; - PrimitiveType primitiveType; - switch (primitive.mode) { - case TINYGLTF_MODE_TRIANGLES: - primitiveType = PrimitiveType::Triangle; - numPrimitives = numIndices / 3; - break; - case TINYGLTF_MODE_TRIANGLE_STRIP: - primitiveType = PrimitiveType::Triangle; - numPrimitives = numIndices - 2; - break; - case TINYGLTF_MODE_TRIANGLE_FAN: - primitiveType = PrimitiveType::Triangle; - numPrimitives = numIndices - 2; - break; - case TINYGLTF_MODE_LINE: - primitiveType = PrimitiveType::Line; - numPrimitives = numIndices / 2; - break; - case TINYGLTF_MODE_LINE_LOOP: - primitiveType = PrimitiveType::Line; - numPrimitives = numIndices + 1; - break; - case TINYGLTF_MODE_POINTS: - primitiveType = PrimitiveType::Point; - numPrimitives = numIndices; - break; - default: - // output error - break; - }; - - - // ----------Attributes------------- - - auto it(primitive.attributes.begin()); - auto itEnd(primitive.attributes.end()); - - int numVertices = 0; - // for each attribute - for (; it != itEnd; it++) { - const tinygltf::Accessor &accessor = scene.accessors.at(it->second); - const tinygltf::BufferView &bufferView = scene.bufferViews.at(accessor.bufferView); - - int n = 1; - if (accessor.type == TINYGLTF_TYPE_SCALAR) { - n = 1; - } - else if (accessor.type == TINYGLTF_TYPE_VEC2) { - n = 2; - } - else if (accessor.type == TINYGLTF_TYPE_VEC3) { - n = 3; - } - else if (accessor.type == TINYGLTF_TYPE_VEC4) { - n = 4; - } - - BufferByte * dev_bufferView = bufferViewDevPointers.at(accessor.bufferView); - BufferByte ** dev_attribute = NULL; - - numVertices = accessor.count; - int componentTypeByteSize; - - // Note: since the type of our attribute array (dev_position) is static (float32) - // We assume the glTF model attribute type are 5126(FLOAT) here - - if (it->first.compare("POSITION") == 0) { - componentTypeByteSize = sizeof(VertexAttributePosition) / n; - dev_attribute = (BufferByte**)&dev_position; - } - else if (it->first.compare("NORMAL") == 0) { - componentTypeByteSize = sizeof(VertexAttributeNormal) / n; - dev_attribute = (BufferByte**)&dev_normal; - } - else if (it->first.compare("TEXCOORD_0") == 0) { - componentTypeByteSize = sizeof(VertexAttributeTexcoord) / n; - dev_attribute = (BufferByte**)&dev_texcoord0; - } - - std::cout << accessor.bufferView << " - " << it->second << " - " << it->first << '\n'; - - dim3 numThreadsPerBlock(128); - dim3 numBlocks((n * numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - int byteLength = numVertices * n * componentTypeByteSize; - cudaMalloc(dev_attribute, byteLength); - - _deviceBufferCopy << > > ( - n * numVertices, - *dev_attribute, - dev_bufferView, - n, - accessor.byteStride, - accessor.byteOffset, - componentTypeByteSize); - - std::string msg = "Set Attribute Buffer: " + it->first; - checkCUDAError(msg.c_str()); - } - - // malloc for VertexOut - VertexOut* dev_vertexOut; - cudaMalloc(&dev_vertexOut, numVertices * sizeof(VertexOut)); - checkCUDAError("Malloc VertexOut Buffer"); - - // ----------Materials------------- - - // You can only worry about this part once you started to - // implement textures for your rasterizer - TextureData* dev_diffuseTex = NULL; - int diffuseTexWidth = 0; - int diffuseTexHeight = 0; - if (!primitive.material.empty()) { - const tinygltf::Material &mat = scene.materials.at(primitive.material); - printf("material.name = %s\n", mat.name.c_str()); - - if (mat.values.find("diffuse") != mat.values.end()) { - std::string diffuseTexName = mat.values.at("diffuse").string_value; - if (scene.textures.find(diffuseTexName) != scene.textures.end()) { - const tinygltf::Texture &tex = scene.textures.at(diffuseTexName); - if (scene.images.find(tex.source) != scene.images.end()) { - const tinygltf::Image &image = scene.images.at(tex.source); - - size_t s = image.image.size() * sizeof(TextureData); - cudaMalloc(&dev_diffuseTex, s); - cudaMemcpy(dev_diffuseTex, &image.image.at(0), s, cudaMemcpyHostToDevice); - - diffuseTexWidth = image.width; - diffuseTexHeight = image.height; - - checkCUDAError("Set Texture Image data"); - } - } - } - - // TODO: write your code for other materails - // You may have to take a look at tinygltfloader - // You can also use the above code loading diffuse material as a start point - } - - - // ---------Node hierarchy transform-------- - cudaDeviceSynchronize(); - - dim3 numBlocksNodeTransform((numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - _nodeMatrixTransform << > > ( - numVertices, - dev_position, - dev_normal, - matrix, - matrixNormal); - - checkCUDAError("Node hierarchy transformation"); - - // at the end of the for loop of primitive - // push dev pointers to map - primitiveVector.push_back(PrimitiveDevBufPointers{ - primitive.mode, - primitiveType, - numPrimitives, - numIndices, - numVertices, - - dev_indices, - dev_position, - dev_normal, - dev_texcoord0, - - dev_diffuseTex, - diffuseTexWidth, - diffuseTexHeight, - - dev_vertexOut //VertexOut - }); - - totalNumPrimitives += numPrimitives; - - } // for each primitive - - } // for each mesh - - } // for each node - - } - - - // 3. Malloc for dev_primitives - { - cudaMalloc(&dev_primitives, totalNumPrimitives * sizeof(Primitive)); - } - - - // Finally, cudaFree raw dev_bufferViews - { - - std::map::const_iterator it(bufferViewDevPointers.begin()); - std::map::const_iterator itEnd(bufferViewDevPointers.end()); - - //bufferViewDevPointers - - for (; it != itEnd; it++) { - cudaFree(it->second); - } - - checkCUDAError("Free BufferView Device Mem"); - } + // 3. Malloc for dev_primitives + { + cudaMalloc(&dev_primitives, totalNumPrimitives * sizeof(Primitive)); + } + + + // Finally, cudaFree raw dev_bufferViews + { + + std::map::const_iterator it(bufferViewDevPointers.begin()); + std::map::const_iterator itEnd(bufferViewDevPointers.end()); + //bufferViewDevPointers + for (; it != itEnd; it++) { + cudaFree(it->second); + } + + checkCUDAError("Free BufferView Device Mem"); + } } +//------------------------- SHADERS ----------------------------------------------------------------- + +static int curPrimitiveBeginId = 0; +__global__ +void _vertexTransformAndAssembly(PrimitiveDevBufPointers primitive, glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, glm::mat3 M_InverseTranspose, glm::vec3 eyePos, int width, int height) +{ + // vertex id + const int vid = (blockIdx.x * blockDim.x) + threadIdx.x; + const int numVertices = primitive.numVertices; + const int numInstances = primitive.numInstances; + + if (vid < numVertices) + { + // Attributes + const glm::vec3 inPos = primitive.dev_position[vid]; + const glm::vec3 inNormal = primitive.dev_normal[vid]; + + // This is in NDC Space + glm::vec4 outPos = MVP * glm::vec4(inPos, 1.f); + outPos /= outPos.w; + + // Convert to screen Space + const glm::vec4 screenPos = NDCToScreenSpace(&outPos, width, height); + + // TODO : Change this + const float instanceOffset = 100.f; + + const int gridSize = 3;// glm::sqrt(numInstances); + + for (int instanceId = 0; instanceId < numInstances; ++instanceId) + { + const float offsetX = (instanceId % gridSize) * instanceOffset; + const float offsetY = (instanceId / gridSize) * instanceOffset; + + // Output of vertex shader + primitive.dev_verticesOut[vid * numInstances + instanceId].pos = screenPos + glm::vec4(offsetX, offsetY, 0.f, 0.f);// instanceOffset * float(instanceId); + primitive.dev_verticesOut[vid * numInstances + instanceId].eyePos = glm::vec3(MV * glm::vec4(inPos, 1.f)); + primitive.dev_verticesOut[vid * numInstances + instanceId].eyeNor = MV_normal * inNormal; + + if (primitive.dev_diffuseTex != NULL) + { + primitive.dev_verticesOut[vid * numInstances + instanceId].texcoord0 = primitive.dev_texcoord0[vid]; + } + else + { + primitive.dev_verticesOut[vid * numInstances + instanceId].color = glm::vec3(1.0f);//glm::clamp(glm::vec3(outPos), glm::vec3(0.f), glm::vec3(1.f)); + } + } + } +} -__global__ -void _vertexTransformAndAssembly( - int numVertices, - PrimitiveDevBufPointers primitive, - glm::mat4 MVP, glm::mat4 MV, glm::mat3 MV_normal, - int width, int height) { +__global__ +void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_primitives, PrimitiveDevBufPointers primitive) { - // vertex id - int vid = (blockIdx.x * blockDim.x) + threadIdx.x; - if (vid < numVertices) { + // index id + const int iid = (blockIdx.x * blockDim.x) + threadIdx.x; + const int numInstances = primitive.numInstances; + + if (iid < numIndices) { + + // This is primitive assembly for triangles + + for (int instanceId = 0; instanceId < numInstances; ++instanceId) + { + int pid = iid / (int)primitive.primitiveType;; // id for cur primitives vector + if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) + { + const int devBufferIndexPos = primitive.dev_indices[iid] * numInstances + instanceId; + const int primitiveIndexPos = (pid + curPrimitiveBeginId) * numInstances + instanceId; + + dev_primitives[primitiveIndexPos].v[iid % (int)primitive.primitiveType] = primitive.dev_verticesOut[devBufferIndexPos]; + dev_primitives[primitiveIndexPos].primitiveType = primitive.primitiveType; + dev_primitives[primitiveIndexPos].instanceId = instanceId; + + if (primitive.dev_diffuseTex != NULL) + { + dev_primitives[primitiveIndexPos].dev_diffuseTex = primitive.dev_diffuseTex; + dev_primitives[primitiveIndexPos].diffuseTexWidth = primitive.diffuseTexWidth; + dev_primitives[primitiveIndexPos].diffuseTexHeight = primitive.diffuseTexHeight; + } + else + { + dev_primitives[primitiveIndexPos].dev_diffuseTex = NULL; + dev_primitives[primitiveIndexPos].diffuseTexWidth = 0; + dev_primitives[primitiveIndexPos].diffuseTexHeight = 0; + } + } + } + } +} - // TODO: Apply vertex transformation here - // Multiply the MVP matrix for each vertex position, this will transform everything into clipping space - // Then divide the pos by its w element to transform into NDC space - // Finally transform x and y to viewport space +__global__ +void _rasterizePrimitive(int width, int height, int totalNumPrimitives, Primitive* dev_primitives, Fragment* dev_fragmentBuffer, float* dev_depth, unsigned int* mutexLock) +{ + // primitive id + const int primitiveId = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (primitiveId < totalNumPrimitives) + { + Primitive primitive = dev_primitives[primitiveId]; + + PrimitiveType test = Triangle; + + if (test == Triangle) + { + // Vertices in screen Space + VertexOut v1 = primitive.v[0]; + VertexOut v2 = primitive.v[1]; + VertexOut v3 = primitive.v[2]; + + glm::vec3 triangle[3]; + triangle[0] = glm::vec3(v1.pos); + triangle[1] = glm::vec3(v2.pos); + triangle[2] = glm::vec3(v3.pos); + + TextureData* dev_diffuseTex = primitive.dev_diffuseTex; + int textureWidth = primitive.diffuseTexWidth; + int textureHeight = primitive.diffuseTexHeight; + + const AABB bounds = getAABBForTriangle(triangle); + + // Clamp To Screen + float minX = glm::clamp(bounds.min[0], 0.f, width - 1.f); + float maxX = glm::clamp(bounds.max[0], 0.f, width - 1.f); + float minY = glm::clamp(bounds.min[1], 0.f, height - 1.f); + float maxY = glm::clamp(bounds.max[1], 0.f, height - 1.f); + + for (int row = minY; row <= maxY; ++row) + { + for (int col = minX; col <= maxX; ++col) + { + const int pixelIndex = col + row * width; + const glm::vec2 currPos(col, row); + + // Calculate BaryCentric coordinates + const glm::vec3 baryCoord = calculateBarycentricCoordinate(triangle, currPos); + + // Check if point is inside triangle + const bool isInside = isBarycentricCoordInBounds(baryCoord); + + if (isInside) + { + // Get the interop depth + const float currDepth = -getZAtCoordinate(baryCoord, triangle); + + bool isSet; + do { + isSet = (atomicCAS(&mutexLock[pixelIndex], 0, 1) == 0); + if (isSet) + { + if (currDepth < dev_depth[pixelIndex]) + { + dev_fragmentBuffer[pixelIndex].eyeNor = (baryCoord.x * v1.eyeNor) + (baryCoord.y * v2.eyeNor) + (baryCoord.z * v3.eyeNor); + dev_fragmentBuffer[pixelIndex].eyePos = (baryCoord.x * v1.eyePos) + (baryCoord.y * v2.eyePos) + (baryCoord.z * v3.eyePos); + + if (dev_diffuseTex != NULL) + { + glm::vec2 textureCoord = baryCoord[0] * v1.texcoord0 + baryCoord[1] * v2.texcoord0 + baryCoord[2] * v3.texcoord0; + textureCoord = glm::vec2(textureCoord.x * textureWidth, textureCoord.y * textureHeight); + + const int u = int(textureCoord.x); + const int v = int(textureCoord.y); + +#ifdef BI_LINEAR_INTERPOLATION + dev_fragmentBuffer[pixelIndex].color = sampleTextureBiLinear(dev_diffuseTex, u, v, glm::fract(textureCoord), textureWidth, BYTES_PER_PIXEL); +#else + dev_fragmentBuffer[pixelIndex].color = sampleTextureSimple(dev_diffuseTex, u, v, textureWidth, BYTES_PER_PIXEL); +#endif + } + else + { + dev_fragmentBuffer[pixelIndex].color = (baryCoord.x * v1.color) + (baryCoord.y * v2.color) + (baryCoord.z * v3.color); + } + dev_depth[pixelIndex] = currDepth; + } + } + if (isSet) + { + mutexLock[pixelIndex] = 0; + } + } while (!isSet); + + } + } + } + } + else if (test == Line) + { + // Vertices in screen Space + VertexOut v1 = primitive.v[0]; + VertexOut v2 = primitive.v[1]; + VertexOut v3 = primitive.v[2]; + + glm::vec3 triangle[3]; + triangle[0] = glm::vec3(v1.pos); + triangle[1] = glm::vec3(v2.pos); + triangle[2] = glm::vec3(v3.pos); + + TextureData* dev_diffuseTex = primitive.dev_diffuseTex; + int textureWidth = primitive.diffuseTexWidth; + int textureHeight = primitive.diffuseTexHeight; + + const AABB bounds = getAABBForTriangle(triangle); + + // Clamp To Screen + float minX = glm::clamp(bounds.min[0], 0.f, width - 1.f); + float maxX = glm::clamp(bounds.max[0], 0.f, width - 1.f); + float minY = glm::clamp(bounds.min[1], 0.f, height - 1.f); + float maxY = glm::clamp(bounds.max[1], 0.f, height - 1.f); + + for (int row = minY; row <= maxY; ++row) + { + for (int col = minX; col <= maxX; ++col) + { + const int pixelIndex = col + row * width; + const glm::vec2 currPos(col, row); + + // Calculate BaryCentric coordinates + const glm::vec3 baryCoord = calculateBarycentricCoordinate(triangle, currPos); + + // Check if point is inside triangle + const bool isInside = isBarycentricCoordOnBoundary(baryCoord); + + if (isInside) + { + // Get the interop depth + const float currDepth = -getZAtCoordinate(baryCoord, triangle); + + bool isSet; + do { + isSet = (atomicCAS(&mutexLock[pixelIndex], 0, 1) == 0); + if (isSet) + { + if (currDepth < dev_depth[pixelIndex]) + { + dev_fragmentBuffer[pixelIndex].eyeNor = (baryCoord.x * v1.eyeNor) + (baryCoord.y * v2.eyeNor) + (baryCoord.z * v3.eyeNor); + dev_fragmentBuffer[pixelIndex].eyePos = (baryCoord.x * v1.eyePos) + (baryCoord.y * v2.eyePos) + (baryCoord.z * v3.eyePos); + + if (dev_diffuseTex != NULL) + { + glm::vec2 textureCoord = baryCoord[0] * v1.texcoord0 + baryCoord[1] * v2.texcoord0 + baryCoord[2] * v3.texcoord0; + textureCoord = glm::vec2(textureCoord.x * textureWidth, textureCoord.y * textureHeight); + + const int u = int(textureCoord.x); + const int v = int(textureCoord.y); + +#ifdef BI_LINEAR_INTERPOLATION + dev_fragmentBuffer[pixelIndex].color = sampleTextureBiLinear(dev_diffuseTex, u, v, glm::fract(textureCoord), textureWidth, BYTES_PER_PIXEL); +#else + dev_fragmentBuffer[pixelIndex].color = sampleTextureSimple(dev_diffuseTex, u, v, textureWidth, BYTES_PER_PIXEL); +#endif + } + else + { + dev_fragmentBuffer[pixelIndex].color = (baryCoord.x * v1.color) + (baryCoord.y * v2.color) + (baryCoord.z * v3.color); + } + dev_depth[pixelIndex] = currDepth; + } + } + if (isSet) + { + mutexLock[pixelIndex] = 0; + } + } while (!isSet); + + } + } + } + + } + else if (test == Point) + { + // Vertices in screen Space + VertexOut v1 = primitive.v[0]; + VertexOut v2 = primitive.v[1]; + VertexOut v3 = primitive.v[2]; + + glm::vec3 triangle[3]; + triangle[0] = glm::vec3(v1.pos); + triangle[1] = glm::vec3(v2.pos); + triangle[2] = glm::vec3(v3.pos); + + TextureData* dev_diffuseTex = primitive.dev_diffuseTex; + int textureWidth = primitive.diffuseTexWidth; + int textureHeight = primitive.diffuseTexHeight; + + const AABB bounds = getAABBForTriangle(triangle); + + // Clamp To Screen + float minX = glm::clamp(bounds.min[0], 0.f, width - 1.f); + float maxX = glm::clamp(bounds.max[0], 0.f, width - 1.f); + float minY = glm::clamp(bounds.min[1], 0.f, height - 1.f); + float maxY = glm::clamp(bounds.max[1], 0.f, height - 1.f); + + for (int row = minY; row <= maxY; ++row) + { + for (int col = minX; col <= maxX; ++col) + { + const int pixelIndex = col + row * width; + const glm::vec2 currPos(col, row); + + // Calculate BaryCentric coordinates + const glm::vec3 baryCoord = calculateBarycentricCoordinate(triangle, currPos); + + // Check if point is on the vertices of the triangle + const bool isInside = isBarycentricCoordOnVertices(triangle, currPos); + + if (isInside) + { + // Get the interop depth + const float currDepth = -getZAtCoordinate(baryCoord, triangle); + + bool isSet; + do { + isSet = (atomicCAS(&mutexLock[pixelIndex], 0, 1) == 0); + if (isSet) + { + if (currDepth < dev_depth[pixelIndex]) + { + dev_fragmentBuffer[pixelIndex].eyeNor = (baryCoord.x * v1.eyeNor) + (baryCoord.y * v2.eyeNor) + (baryCoord.z * v3.eyeNor); + dev_fragmentBuffer[pixelIndex].eyePos = (baryCoord.x * v1.eyePos) + (baryCoord.y * v2.eyePos) + (baryCoord.z * v3.eyePos); + + if (dev_diffuseTex != NULL) + { + glm::vec2 textureCoord = baryCoord[0] * v1.texcoord0 + baryCoord[1] * v2.texcoord0 + baryCoord[2] * v3.texcoord0; + textureCoord = glm::vec2(textureCoord.x * textureWidth, textureCoord.y * textureHeight); + + const int u = int(textureCoord.x); + const int v = int(textureCoord.y); + +#ifdef BI_LINEAR_INTERPOLATION + dev_fragmentBuffer[pixelIndex].color = sampleTextureBiLinear(dev_diffuseTex, u, v, glm::fract(textureCoord), textureWidth, BYTES_PER_PIXEL); +#else + dev_fragmentBuffer[pixelIndex].color = sampleTextureSimple(dev_diffuseTex, u, v, textureWidth, BYTES_PER_PIXEL); +#endif + } + else + { + dev_fragmentBuffer[pixelIndex].color = (baryCoord.x * v1.color) + (baryCoord.y * v2.color) + (baryCoord.z * v3.color); + } + dev_depth[pixelIndex] = currDepth; + } + } + if (isSet) + { + mutexLock[pixelIndex] = 0; + } + } while (!isSet); + + } + } + } + + } + } +} - // TODO: Apply vertex assembly here - // Assemble all attribute arraies into the primitive array - - } +__global__ +void _rasterizePrimitiveWithMSAA(int width, int height, int totalNumPrimitives, Primitive* dev_primitives, Fragment* dev_fragmentBuffer, float* dev_depth, unsigned int* mutexLock) +{ + // primitive id + /*const int primitiveId = (blockIdx.x * blockDim.x) + threadIdx.x; + + if (primitiveId < totalNumPrimitives) + { + Primitive primitive = dev_primitives[primitiveId]; + + if (primitive.primitiveType == Triangle) + { + // Vertices in screen Space + VertexOut v1 = primitive.v[0]; + VertexOut v2 = primitive.v[1]; + VertexOut v3 = primitive.v[2]; + + glm::vec3 triangle[3]; + triangle[0] = glm::vec3(v1.pos); + triangle[1] = glm::vec3(v2.pos); + triangle[2] = glm::vec3(v3.pos); + + TextureData* dev_diffuseTex = primitive.dev_diffuseTex; + int textureWidth = primitive.diffuseTexWidth; + int textureHeight = primitive.diffuseTexHeight; + + const AABB bounds = getAABBForTriangle(triangle); + + // Clamp To Screen + float minX = glm::clamp(bounds.min[0], 0.f, width - 1.f); + float maxX = glm::clamp(bounds.max[0], 0.f, width - 1.f); + float minY = glm::clamp(bounds.min[1], 0.f, height - 1.f); + float maxY = glm::clamp(bounds.max[1], 0.f, height - 1.f); + + for (int row = minY; row <= maxY; row += MSAA_LEVEL) + { + for (int col = minX; col <= maxX; col += MSAA_LEVEL) + { + + + bool isInside = false; + + const int centerPixelIndex = (col + MSAA_LEVEL / 2) + (row * MSAA_LEVEL / 2) * width; + const glm::vec2 centerPixelPos((col + MSAA_LEVEL / 2), (row * MSAA_LEVEL / 2)); + const glm::vec3 centerBaryCoord = calculateBarycentricCoordinate(triangle, centerPixelPos); + + for (int subSampleX = 0; subSampleX < MSAA_LEVEL; ++subSampleX) + { + for (int subSampleY = 0; subSampleY < MSAA_LEVEL; ++subSampleY) + { + + const glm::vec2 currPosSubSample(col + subSampleX + subSampleY, row); + // Calculate BaryCentric coordinates + subSampleBarCoord[subSampleX][subSampleY] = calculateBarycentricCoordinate(triangle, currPosSubSample); + + // Check if point is inside triangle + const bool subSampleHit = isBarycentricCoordInBounds(subSampleBarCoord[subSampleX][subSampleY]); + subSubSampleHit[subSampleX][subSampleY] = subSampleHit; + if (subSampleHit) + { + isInside = true; + } + } + } + + if (isInside) + { + + // 1. Sample the center pixel + glm::vec3 centerPixelColor(0.f); + + bool isSet; + do { + isSet = (atomicCAS(&mutexLock[centerPixelIndex], 0, 1) == 0); + if (isSet) + { + if (currDepth < dev_depth[centerPixelIndex]) + { + dev_fragmentBuffer[centerPixelIndex].eyeNor = (centerBaryCoord.x * v1.eyeNor) + (centerBaryCoord.y * v2.eyeNor) + (centerBaryCoord.z * v3.eyeNor); + dev_fragmentBuffer[centerPixelIndex].eyePos = (centerBaryCoord.x * v1.eyePos) + (centerBaryCoord.y * v2.eyePos) + (centerBaryCoord.z * v3.eyePos); + + if (dev_diffuseTex != NULL) + { + glm::vec2 textureCoord = (centerBaryCoord.x * v1.texcoord0) + (centerBaryCoord.y * v2.texcoord0) + (centerBaryCoord.z * v3.texcoord0); + textureCoord = glm::vec2(textureCoord.x * textureWidth, textureCoord.y * textureHeight); + + textureCoord = glm::clamp(textureCoord, glm::vec2(0.f), glm::vec2(textureWidth - 1, textureHeight - 1)); + + // Apparently there are 3 bytes per pixel based on the texture array size and texture size. + const int startPixelIndex = int(textureCoord.x + textureCoord.y * textureWidth) * 3; + float r = dev_diffuseTex[startPixelIndex]; + float g = dev_diffuseTex[startPixelIndex + 1]; + float b = dev_diffuseTex[startPixelIndex + 2]; + centerPixelColor = glm::vec3(r, g, b) / 255.f; + dev_fragmentBuffer[pixelIndex].color = centerPixelColor; + } + else + { + centerPixelColor = glm::vec3(1.0f);// (baryCoord.x * v1.color) + (baryCoord.y * v2.color) + (baryCoord.z * v3.color); + dev_fragmentBuffer[centerPixelIndex].color = centerPixelColor; + } + dev_depth[centerPixelIndex] = currDepth; + } + } + if (isSet) + { + mutexLock[centerPixelIndex] = 0; + } + } while (!isSet); + + // 2. Do a depth test for all the remaining subSamples and update framebuffer + + + for (int subSampleX = 0; subSampleX < MSAA_LEVEL; ++subSampleX) + { + for (int subSampleY = 0; subSampleY < MSAA_LEVEL; ++subSampleY) + { + const glm::vec2 currPosSubSample(col + subSampleX + subSampleY, row); + + } + } + } + } + } + } + }*/ } +//#define MAT_PLAIN +#define MAT_LAMBERT +//#define MAT_BLINN_PHONG -static int curPrimitiveBeginId = 0; +#define BLINN_PHONEXP 64.f +#define AMBIENT_LIGHT 0.2f -__global__ -void _primitiveAssembly(int numIndices, int curPrimitiveBeginId, Primitive* dev_primitives, PrimitiveDevBufPointers primitive) { +/** +* Writes fragment colors to the framebuffer +*/ +__global__ +void render(int w, int h, Fragment *fragmentBuffer, glm::vec3 *framebuffer) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int index = x + (y * w); + + if (x < w && y < h) + { + + // No Shading +#ifdef MAT_PLAIN + framebuffer[index] = fragmentBuffer[index].color; +#endif - // index id - int iid = (blockIdx.x * blockDim.x) + threadIdx.x; + // Lamberts Material +#ifdef MAT_LAMBERT - if (iid < numIndices) { + float lambertsTerm = glm::abs(glm::dot(fragmentBuffer[index].eyeNor, glm::vec3(0, 0, 1))) + AMBIENT_LIGHT; + lambertsTerm = glm::clamp(lambertsTerm, 0.f, 1.f); + framebuffer[index] = fragmentBuffer[index].color * lambertsTerm; +#endif - // TODO: uncomment the following code for a start - // This is primitive assembly for triangles +#ifdef MAT_BLINN_PHONG - //int pid; // id for cur primitives vector - //if (primitive.primitiveMode == TINYGLTF_MODE_TRIANGLES) { - // pid = iid / (int)primitive.primitiveType; - // dev_primitives[pid + curPrimitiveBeginId].v[iid % (int)primitive.primitiveType] - // = primitive.dev_verticesOut[primitive.dev_indices[iid]]; - //} + const glm::vec3 fsNorm = fragmentBuffer[index].eyeNor; + const glm::vec3 fsCamera = glm::vec3(0, 0, 1.f); + const glm::vec3 hVec = (fsNorm + fsCamera) / 2.f; + const float specular = glm::max(glm::pow(glm::dot(glm::normalize(hVec), glm::normalize(fsNorm)), BLINN_PHONEXP), 0.f); + const float lambertsTerm = glm::clamp(glm::abs(glm::dot(fsNorm, fsCamera)) + AMBIENT_LIGHT, 0.f, 1.f); - // TODO: other primitive types (point, line) - } - + framebuffer[index] = fragmentBuffer[index].color * (lambertsTerm + specular); +#endif + + } } +#define SSAA_UNIFORM_GRID + +/** +* Performs Anti-Aliasing +*/ +__global__ +void _SSAA(int downWidth, int downHeight, int originalWidth, int originalHeight, glm::vec3 *inputFrameBuffer, glm::vec3 *aaFrameBuffer) +{ + int x = (blockIdx.x * blockDim.x) + threadIdx.x; + int y = (blockIdx.y * blockDim.y) + threadIdx.y; + int index = x + (y * downWidth); + + if (x < downWidth && y < downHeight) + { +#ifdef SSAA_UNIFORM_GRID + + const int upscaledIndexX = x * SSAA_LEVEL; + const int upscaledIndexY = y * SSAA_LEVEL; + + glm::vec3 averageColor(0.f); + int numColors = 0; + + for (int i = 0; i < SSAA_LEVEL; ++i) + { + for (int j = 0; j < SSAA_LEVEL; ++j) + { + const int newIndex = (upscaledIndexX + i) + ((upscaledIndexY + i) * originalWidth); + averageColor += inputFrameBuffer[newIndex]; + numColors++; + } + } + aaFrameBuffer[index] = (averageColor / float(numColors)); + +#endif // SSAA_UNIFORM_GRID + + } +} /** * Perform rasterization. */ -void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal) { +void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3& MV_normal, const glm::mat3& M_inverseTranspose, const glm::vec3& eyePos) { + int sideLength2d = 8; dim3 blockSize2d(sideLength2d, sideLength2d); - dim3 blockCount2d((width - 1) / blockSize2d.x + 1, - (height - 1) / blockSize2d.y + 1); - - // Execute your rasterization pipeline here - // (See README for rasterization pipeline outline.) - - // Vertex Process & primitive assembly - { - curPrimitiveBeginId = 0; - dim3 numThreadsPerBlock(128); - - auto it = mesh2PrimitivesMap.begin(); - auto itEnd = mesh2PrimitivesMap.end(); - - for (; it != itEnd; ++it) { - auto p = (it->second).begin(); // each primitive - auto pEnd = (it->second).end(); - for (; p != pEnd; ++p) { - dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); - - _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> >(p->numVertices, *p, MVP, MV, MV_normal, width, height); - checkCUDAError("Vertex Processing"); - cudaDeviceSynchronize(); - _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > - (p->numIndices, - curPrimitiveBeginId, - dev_primitives, - *p); - checkCUDAError("Primitive Assembly"); - - curPrimitiveBeginId += p->numPrimitives; - } - } - - checkCUDAError("Vertex Processing and Primitive Assembly"); - } - - cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); - initDepth << > >(width, height, dev_depth); - - // TODO: rasterize - - - - // Copy depthbuffer colors into framebuffer - render << > >(width, height, dev_fragmentBuffer, dev_framebuffer); - checkCUDAError("fragment shader"); + + dim3 blockCount2d((width - 1) / blockSize2d.x + 1, + (height - 1) / blockSize2d.y + 1); + + dim3 blockAACount2d((originalWidth - 1) / blockSize2d.x + 1, + (originalHeight - 1) / blockSize2d.y + 1); + + dim3 numThreadsPerBlock(128); + dim3 blocksPrimitives((totalNumPrimitives + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + + // Execute your rasterization pipeline here + // (See README for rasterization pipeline outline.) + + // Vertex Process & primitive assembly + { + curPrimitiveBeginId = 0; + + auto it = mesh2PrimitivesMap.begin(); + auto itEnd = mesh2PrimitivesMap.end(); + + for (; it != itEnd; ++it) + { + auto p = (it->second).begin(); + auto pEnd = (it->second).end(); + + for (; p != pEnd; ++p) + { + dim3 numBlocksForVertices((p->numVertices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + dim3 numBlocksForIndices((p->numIndices + numThreadsPerBlock.x - 1) / numThreadsPerBlock.x); + + // 1. Vertex Assembly and Shader + _vertexTransformAndAssembly << < numBlocksForVertices, numThreadsPerBlock >> > (*p, MVP, MV, MV_normal, M_inverseTranspose, eyePos, width, height); + checkCUDAError("Vertex Processing"); + + cudaDeviceSynchronize(); + + // 2. Primitive Assembly + _primitiveAssembly << < numBlocksForIndices, numThreadsPerBlock >> > + (p->numIndices, + curPrimitiveBeginId, + dev_primitives, + *p); + checkCUDAError("Primitive Assembly"); + + curPrimitiveBeginId += (p->numPrimitives * p->numInstances); + } + } + + checkCUDAError("Vertex Processing and Primitive Assembly"); + } + + cudaMemset(dev_fragmentBuffer, 0, width * height * sizeof(Fragment)); + + // 3. Depth Check + initDepth << > > (width, height, dev_depth); + + // 4. Rasterize - Call per primitive +#ifdef AA_MULTI_SAMPLE + _rasterizePrimitive << > > (width, height, totalNumPrimitives, dev_primitives, dev_fragmentBuffer, dev_depth, dev_mutex); +#else + _rasterizePrimitive << > > (width, height, totalNumPrimitives, dev_primitives, dev_fragmentBuffer, dev_depth, dev_mutex); +#endif + checkCUDAError("Rasterizer"); + + // Copy fragmentBuffer colors into framebuffer + render << > > (width, height, dev_fragmentBuffer, dev_framebuffer); + checkCUDAError("fragment shader"); + + // perform SSAA + _SSAA << < blockAACount2d, blockSize2d >> > (originalWidth, originalHeight, width, height, dev_framebuffer, dev_AAFrameBuffer); + checkCUDAError("SSAA"); + // Copy framebuffer into OpenGL buffer for OpenGL previewing - sendImageToPBO<<>>(pbo, width, height, dev_framebuffer); + sendImageToPBO << > > (pbo, originalWidth, originalHeight, dev_AAFrameBuffer); checkCUDAError("copy render result to pbo"); } @@ -741,36 +1329,43 @@ void rasterizeFree() { // deconstruct primitives attribute/indices device buffer - auto it(mesh2PrimitivesMap.begin()); - auto itEnd(mesh2PrimitivesMap.end()); - for (; it != itEnd; ++it) { - for (auto p = it->second.begin(); p != it->second.end(); ++p) { - cudaFree(p->dev_indices); - cudaFree(p->dev_position); - cudaFree(p->dev_normal); - cudaFree(p->dev_texcoord0); - cudaFree(p->dev_diffuseTex); + auto it(mesh2PrimitivesMap.begin()); + auto itEnd(mesh2PrimitivesMap.end()); + for (; it != itEnd; ++it) { + for (auto p = it->second.begin(); p != it->second.end(); ++p) { + cudaFree(p->dev_indices); + cudaFree(p->dev_position); + cudaFree(p->dev_normal); + cudaFree(p->dev_texcoord0); + cudaFree(p->dev_diffuseTex); + + cudaFree(p->dev_verticesOut); - cudaFree(p->dev_verticesOut); - - //TODO: release other attributes and materials - } - } + //TODO: release other attributes and materials + } + } - //////////// + //////////// cudaFree(dev_primitives); dev_primitives = NULL; - cudaFree(dev_fragmentBuffer); - dev_fragmentBuffer = NULL; + cudaFree(dev_fragmentBuffer); + dev_fragmentBuffer = NULL; cudaFree(dev_framebuffer); dev_framebuffer = NULL; - cudaFree(dev_depth); - dev_depth = NULL; + cudaFree(dev_AAFrameBuffer); + dev_AAFrameBuffer = NULL; + + cudaFree(dev_depth); + dev_depth = NULL; + + cudaFree(dev_mutex); + dev_mutex = NULL; + checkCUDAError("rasterize Free"); } diff --git a/src/rasterize.h b/src/rasterize.h index 560aae9..a82bbfb 100644 --- a/src/rasterize.h +++ b/src/rasterize.h @@ -20,5 +20,5 @@ namespace tinygltf{ void rasterizeInit(int width, int height); void rasterizeSetBuffers(const tinygltf::Scene & scene); -void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3 MV_normal); +void rasterize(uchar4 *pbo, const glm::mat4 & MVP, const glm::mat4 & MV, const glm::mat3& MV_normal, const glm::mat3& M_inverseTranspose, const glm::vec3& eyePos); void rasterizeFree(); diff --git a/src/rasterizeTools.h b/src/rasterizeTools.h index 46c701e..b86a13c 100644 --- a/src/rasterizeTools.h +++ b/src/rasterizeTools.h @@ -11,6 +11,7 @@ #include #include #include +#include struct AABB { glm::vec3 min; @@ -25,6 +26,16 @@ glm::vec3 multiplyMV(glm::mat4 m, glm::vec4 v) { return glm::vec3(m * v); } +__host__ __device__ static +glm::vec4 NDCToScreenSpace(const glm::vec4* v, int width, int height) +{ + glm::vec4 screenCoords(0.f, 0.f, 0.f, 1.f); + screenCoords[0] = ((*v)[0] + 1.f) * width / 2.f; + screenCoords[1] = (1.f - (*v)[1]) * height / 2.f; + screenCoords[2] = ((*v)[2] + 1.f) / 2.f;; + return screenCoords; +} + // CHECKITOUT /** * Finds the axis aligned bounding box for a given triangle. @@ -33,13 +44,13 @@ __host__ __device__ static AABB getAABBForTriangle(const glm::vec3 tri[3]) { AABB aabb; aabb.min = glm::vec3( - min(min(tri[0].x, tri[1].x), tri[2].x), - min(min(tri[0].y, tri[1].y), tri[2].y), - min(min(tri[0].z, tri[1].z), tri[2].z)); + glm::min(glm::min(tri[0].x, tri[1].x), tri[2].x), + glm::min(glm::min(tri[0].y, tri[1].y), tri[2].y), + glm::min(glm::min(tri[0].z, tri[1].z), tri[2].z)); aabb.max = glm::vec3( - max(max(tri[0].x, tri[1].x), tri[2].x), - max(max(tri[0].y, tri[1].y), tri[2].y), - max(max(tri[0].z, tri[1].z), tri[2].z)); + glm::max(glm::max(tri[0].x, tri[1].x), tri[2].x), + glm::max(glm::max(tri[0].y, tri[1].y), tri[2].y), + glm::max(glm::max(tri[0].z, tri[1].z), tri[2].z)); return aabb; } @@ -88,6 +99,32 @@ bool isBarycentricCoordInBounds(const glm::vec3 barycentricCoord) { barycentricCoord.z >= 0.0 && barycentricCoord.z <= 1.0; } +#define POINT_TOLERANCE 2.5f +#define LINE_TOLERANCE 0.02f + +#define IS_SAME(a,b) abs(a - b) < POINT_TOLERANCE + +/** +* Check if a barycentric coordinate is on the boundary of a triangle. +*/ +__host__ __device__ static +bool isBarycentricCoordOnBoundary(const glm::vec3 barycentricCoord) +{ + return isBarycentricCoordInBounds(barycentricCoord) && (barycentricCoord.x <= LINE_TOLERANCE || barycentricCoord.y <= LINE_TOLERANCE || + barycentricCoord.z <= LINE_TOLERANCE); +} + +/** +* Check if a barycentric coordinate is points of a triangle. +*/ +__host__ __device__ static +bool isBarycentricCoordOnVertices(const glm::vec3 tri[3], glm::vec2 point) +{ + return (IS_SAME(point.x, tri[0].x) && IS_SAME(point.y, tri[0].y)) || + (IS_SAME(point.x, tri[1].x) && IS_SAME(point.y, tri[1].y)) || + (IS_SAME(point.x, tri[2].x) && IS_SAME(point.y, tri[2].y)); +} + // CHECKITOUT /** * For a given barycentric coordinate, compute the corresponding z position @@ -99,3 +136,131 @@ float getZAtCoordinate(const glm::vec3 barycentricCoord, const glm::vec3 tri[3]) + barycentricCoord.y * tri[1].z + barycentricCoord.z * tri[2].z); } + + +/** +* Sample a given texture at the given UV using no filtering +*/ +__host__ __device__ static +glm::vec3 sampleTextureSimple(const unsigned char* textureData, int u, int v, int textureWidth, int bytesPerPixel) +{ + const int startPixelIndex = int(u + v * textureWidth) * bytesPerPixel; + return glm::vec3(textureData[startPixelIndex], textureData[startPixelIndex + 1], textureData[startPixelIndex + 2]); +} + +/** +* Sample a given texture at the given UV using bilinear filtering +*/ +__host__ __device__ static +glm::vec3 sampleTextureBiLinear(const unsigned char* textureData, int u, int v, const glm::vec2& mixRatio, int textureWidth, int bytesPerPixel) +{ + int sampleIndex = (u + v * textureWidth) * bytesPerPixel; + const glm::vec3 sample1 = glm::vec3(textureData[sampleIndex], textureData[sampleIndex + 1], textureData[sampleIndex + 2]); + + sampleIndex = (u + 1 + v * textureWidth) * bytesPerPixel; + const glm::vec3 sample2 = glm::vec3(textureData[sampleIndex], textureData[sampleIndex + 1], textureData[sampleIndex + 2]); + + sampleIndex = (u + (v + 1) * textureWidth) * bytesPerPixel; + const glm::vec3 sample3 = glm::vec3(textureData[sampleIndex], textureData[sampleIndex + 1], textureData[sampleIndex + 2]); + + sampleIndex = (u + 1 + (v + 1) * textureWidth) * bytesPerPixel; + const glm::vec3 sample4 = glm::vec3(textureData[sampleIndex], textureData[sampleIndex + 1], textureData[sampleIndex + 2]); + + const glm::vec3 mixInX = glm::mix(sample2, sample4, mixRatio.x); + const glm::vec3 mixInY = glm::mix(sample1, sample3, mixRatio.x); + + return glm::mix(mixInX, mixInY, mixRatio.y) / 255.f; +} + + +/** +* This class is used for timing the performance +* Uncopyable and unmovable +* +* Adapted from WindyDarian(https://github.com/WindyDarian) +*/ +class PerformanceTimer +{ +public: + PerformanceTimer() + { + cudaEventCreate(&event_start); + cudaEventCreate(&event_end); + } + + ~PerformanceTimer() + { + cudaEventDestroy(event_start); + cudaEventDestroy(event_end); + } + + void startCpuTimer() + { + if (cpu_timer_started) { throw std::runtime_error("CPU timer already started"); } + cpu_timer_started = true; + + time_start_cpu = std::chrono::high_resolution_clock::now(); + } + + void endCpuTimer() + { + time_end_cpu = std::chrono::high_resolution_clock::now(); + + if (!cpu_timer_started) { throw std::runtime_error("CPU timer not started"); } + + std::chrono::duration duro = time_end_cpu - time_start_cpu; + prev_elapsed_time_cpu_milliseconds = + static_cast(duro.count()); + + cpu_timer_started = false; + } + + void startGpuTimer() + { + if (gpu_timer_started) { throw std::runtime_error("GPU timer already started"); } + gpu_timer_started = true; + + cudaEventRecord(event_start); + } + + void endGpuTimer() + { + cudaEventRecord(event_end); + cudaEventSynchronize(event_end); + + if (!gpu_timer_started) { throw std::runtime_error("GPU timer not started"); } + + cudaEventElapsedTime(&prev_elapsed_time_gpu_milliseconds, event_start, event_end); + gpu_timer_started = false; + } + + float getCpuElapsedTimeForPreviousOperation() //noexcept //(damn I need VS 2015 + { + return prev_elapsed_time_cpu_milliseconds; + } + + float getGpuElapsedTimeForPreviousOperation() //noexcept + { + return prev_elapsed_time_gpu_milliseconds; + } + + // remove copy and move functions + PerformanceTimer(const PerformanceTimer&) = delete; + PerformanceTimer(PerformanceTimer&&) = delete; + PerformanceTimer& operator=(const PerformanceTimer&) = delete; + PerformanceTimer& operator=(PerformanceTimer&&) = delete; + +private: + cudaEvent_t event_start = nullptr; + cudaEvent_t event_end = nullptr; + + using time_point_t = std::chrono::high_resolution_clock::time_point; + time_point_t time_start_cpu; + time_point_t time_end_cpu; + + bool cpu_timer_started = false; + bool gpu_timer_started = false; + + float prev_elapsed_time_cpu_milliseconds = 0.f; + float prev_elapsed_time_gpu_milliseconds = 0.f; +};