You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Voxelize the entire TPC region instead of using segments for drifting. GPU is designed to execute massively parallelizable computation, in particular the matrix multiplication. Processors are advancing towards increasing memory size and bandwidth rather than making smarter use of individual cores. We can utilize sparse tensor to tackle the problem. With the current version, we use segments as a base and then convert it into pixels (some 2D voxelization). The indexing map between segments and pixels need be computed and carried around. We have to write cuda kernels for each index conversion. It increases the manual work, reduces the readibility and is prong to errors. It can be incompatible with batching capability that is often deployed in GPU computation. Rewriting using sparse tensors would allow us to use community developed functions. We can have full vectorization without writing our own kernels every step. Potentially you can batch the time dimension. (I didn't understand this part. I also think it makes backtracking easier.) This would make larndsim more readable and more extendable.
I think it's a cool project. It's a bigger step than #231. Again I'm interested in this project, but cannot promise time at the moment. There is no name tag on these (organic development). Whoever has the interest and time should try it. Tag people who might be interested here @jaafar-chakrani@mjkramer.
The text was updated successfully, but these errors were encountered:
Paraphrasing (largely copy-paste) Kazu's suggestion here:
Voxelize the entire TPC region instead of using segments for drifting. GPU is designed to execute massively parallelizable computation, in particular the matrix multiplication. Processors are advancing towards increasing memory size and bandwidth rather than making smarter use of individual cores. We can utilize sparse tensor to tackle the problem. With the current version, we use segments as a base and then convert it into pixels (some 2D voxelization). The indexing map between segments and pixels need be computed and carried around. We have to write cuda kernels for each index conversion. It increases the manual work, reduces the readibility and is prong to errors. It can be incompatible with batching capability that is often deployed in GPU computation. Rewriting using sparse tensors would allow us to use community developed functions. We can have full vectorization without writing our own kernels every step. Potentially you can batch the time dimension. (I didn't understand this part. I also think it makes backtracking easier.) This would make larndsim more readable and more extendable.
I think it's a cool project. It's a bigger step than #231. Again I'm interested in this project, but cannot promise time at the moment. There is no name tag on these (organic development). Whoever has the interest and time should try it. Tag people who might be interested here @jaafar-chakrani @mjkramer.
The text was updated successfully, but these errors were encountered: