Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@ if(${CMAKE_SYSTEM_NAME} MATCHES "Darwin")
elseif(${CMAKE_SYSTEM_NAME} MATCHES "Linux")
set(EXTERNAL_LIB_PATH "${EXTERNAL}/lib/linux" "/usr/lib64")
elseif(WIN32)
if(${MSVC_VERSION} MATCHES "1900")
if(${MSVC_VERSION} MATCHES "1915")
set(EXTERNAL_LIB_PATH "${EXTERNAL}/lib/win/vc2015")
elseif(${MSVC_VERSION} MATCHES "1900")
set(EXTERNAL_LIB_PATH "${EXTERNAL}/lib/win/vc2015")
elseif(${MSVC_VERSION} MATCHES "1800")
set(EXTERNAL_LIB_PATH "${EXTERNAL}/lib/win/vc2013")
Expand Down
57 changes: 51 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,56 @@
**University of Pennsylvania, CIS 565: GPU Programming and Architecture,
Project 1 - Flocking**

* (TODO) YOUR NAME HERE
* (TODO) [LinkedIn](), [personal website](), [twitter](), etc.
* Tested on: (TODO) Windows 22, i7-2222 @ 2.22GHz 22GB, GTX 222 222MB (Moore 2222 Lab)
* Xiao Zhang
* [LinkedIn](https://www.linkedin.com/in/xiao-zhang-674bb8148/)
* Tested on: Windows 10, i7-7700K @ 4.20GHz 16.0GB, GTX 1080 15.96GB (my own PC)

### (TODO: Your README)
### Screenshot

Include screenshots, analysis, etc. (Remember, this is public, so don't put
anything here that you don't want to share with the world.)
The simulation for the screenshot is using uniform grid with coherent storage of position and velocity. There are 20000 boids being simulated and the blocksize for CUDA is set to 128. Vertical synchronization is turned on.

![](images/grid_coherent_20000_128_vsync.gif)

### Analysis

#### The chart below shows the framerate under different blocksize and boids number configuration of all 3 simulation methods.

![](images/chart1.JPG)

#### The graph below shows the framerate under different blocksize and boids number configuration of the naive simulation method.

![](images/graph1.JPG)

#### The graph below shows the framerate under different blocksize and boids number configuration of the uniform grid simulation method with separate memory storage for position and velocity.

![](images/graph2.JPG)

#### The graph below shows the framerate under different blocksize and boids number configuration of the uniform grid simulation method with coherent memory storage for position and velocity.

![](images/graph3.JPG)

#### The chart below shows the framerate under different boids number configuration of all 3 simulation methods when visualization is turned on and CUDA blocksize is set to 128.

![](images/chart2.JPG)

#### The graph below shows the chart above.

![](images/graph4.JPG)

### Q&A

#### 1. For each implementation, how does changing the number of boids affect performance? Why do you think this is?

Increasing the number of boids lower the framerate. This is mainly because there are more data being transfered between GPU and CPU in one frame. Because according to the analysis, framerate doesn't increase when blocksize is larger, which means parallelization is not the problem and every boid is running on their own thread. So the next most possible reason is data throughput.

#### 2. For each implementation, how does changing the block count and block size affect performance? Why do you think this is?

Changing block count and block size hardly affects the performance. This is because the way we are assigning the workload will always assure that every boid will run on their own thread. When we have small block size, the block count will increase to assure that everything still finish in one grid.

#### 3. For the coherent uniform grid: did you experience any performance improvements with the more coherent uniform grid? Was this the outcome you expected? Why or why not?

Yes. Yes. Because instead of accessing position and velocity in a separated manner for multiple times, we are using one grid to reshuffle the position and velocity so that we can access them in a coherent manner later. In this reshuffle process, we are still accessing position and velocity in a separated manner, but we only need to do this once. After reshufling them, we can access them more efficiently, because the GPU cache and the principle of locality.

#### 4. Did changing cell width and checking 27 vs 8 neighboring cells affect performance? Why or why not? Be careful: it is insufficient (and possibly incorrect) to say that 27-cell is slower simply because there are more cells to check!

Not significantly. First, yes, it will affect performance, because there are more cells to check, which means potentially more boids to account for, and the process is not parallelized. But on the other hand, this means each thread has more chance to have equal amount of work to do (extreme case being all boids checking all cells, which is basically the naive method). On GPU, this is a good sign because the same warp will always wait for the slowest to finish, therefore having a balanced workload for each thread will improve performance.
Binary file added images/chart1.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/chart2.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/graph1.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/graph2.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/graph3.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/graph4.JPG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/grid_coherent_20000_128_vsync.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ set(SOURCE_FILES

cuda_add_library(src
${SOURCE_FILES}
OPTIONS -arch=sm_20
OPTIONS -arch=sm_60
)
Loading