Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with simulation output and results #308

Open
gFrancoCamilo opened this issue Jun 17, 2024 · 1 comment
Open

Issues with simulation output and results #308

gFrancoCamilo opened this issue Jun 17, 2024 · 1 comment

Comments

@gFrancoCamilo
Copy link

Hello,

I've been trying to run this code on accel-sim, but I am running into some problems. First, the output of the simulation is different from the output when running on HW. For some reason, the simulation shows the ciphertext and plaintext as 0, while running on HW has a different outcome. I have attached the output file of the code for AES-128 in counter mode, which shows the ciphertext and plaintext as 0 on line number 733. Do you know what might be causing this to happen?

Another problem I've been facing (which might be related to the first problem) is that the simulated results are not very close to the HW results. While profiling the application with both nsight and nvprof produces similar results of around 5B cycles, the simulation only outputs 137k. I've tried using the Tuner to get more accurate simulations, but the results were the same. Do you know what might be causing this behavior? Here are some of the results: 256-ctr, 128-ctr. Although the cycle results are way off, they seem to be off by a constant factor. Some other stats are also very different from the HW results (included in the additional info).
256-ctr

Some additional info:

  • I've been running the simulation in PTX mode. I tried using SASS, but tracing the application generates a huge file (I had to stop the tracer after a couple of minutes and the file was already over 7GB). Do you know what might be causing this?
  • I am using a 16GB Tesla V100. I tried using the tuner to get more accurate results and even tried different parameters (warp scheduling, memory scheduling) but did not change the overall result.
  • Some simulated stats are pretty accurate (occupancy,l1hitreads, l2-read-transactions), while others are not (dram-read-transactions, l2-read-hits, warp-inst). I have also attached some of these results.
  • I ran the rodinia benchmark, and it produced somewhat accurate results. The results were a lot closer than the app I am trying to run, which produces errors orders of magnitude above.

Do you have any suggestions or ideas on how to solve these issues?

Thank you in advance!

@cesar-avalos3
Copy link
Contributor

This is interesting, if I comment out the printfs from inside the kernels, the workload's (128-ES) execution time goes from 3.319 seconds to 42.785 uS, much closer to the reported simulation time. I don't know what shenanigans happen inside the device-side printf, but maybe we are not accounting for that in PTX execution mode. Will run some more tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants