GPU Memory Management Issue in Multi-Shank Configuration with Kilosort 4.0.16 #771

HiroMiyawaki · 2024-09-03T06:26:40Z

Describe the issue:

I am encountering what appears to be a GPU memory management issue when using the multi-shank configuration in Kilosort 4.0.16. Specifically, when processing data from a Neuropixels 2.0 probe in a 4-shank configuration (384 channels in total, sampled at 30 kHz) for approximately 60 minutes, I receive an error indicating a shortage of GPU memory (detailed error message provided below).

However, when I run Kilosort on data of similar duration (~60 minutes) but in a one-shank configuration (still 384 channels), it processes without any issues. Additionally, when I split the 4-shank dataset into individual shanks and process them separately (96 channels each), the operation also completes successfully, even for longer recordings (>300 minutes).

Given this, I suspect that the multi-shank configuration might require significantly more GPU memory. Could you please confirm if this is the case? If so, is there a guideline for estimating the amount of GPU memory required based on the number of shanks and/or the length of the recording?

Reproduce the bug:

call run_kilosort() with batch_size: 60000

Error message:

15:06 kilosort.run_kilosort ERROR    Encountered error in `run_kilosort`:
Traceback (most recent call last):
  File "c:\Users\---\anaconda3\envs\ks4\lib\site-packages\kilosort\run_kilosort.py", line 205, in run_kilosort
    ops, bfile, st0 = compute_drift_correction(
  File "c:\Users\---\anaconda3\envs\ks4\lib\site-packages\kilosort\run_kilosort.py", line 520, in compute_drift_correction
    ops, st = datashift.run(ops, bfile, device=device, progress_bar=progress_bar,
  File "c:\Users\---\anaconda3\envs\ks4\lib\site-packages\kilosort\datashift.py", line 198, in run
    st, _, ops  = spikedetect.run(
  File "c:\Users\---\anaconda3\envs\ks4\lib\site-packages\kilosort\spikedetect.py", line 253, in run
    xy, imax, amp, adist = template_match(X, ops, iC, iC2, weigh, device=device)
  File "c:\Users\---\anaconda3\envs\ks4\lib\site-packages\kilosort\spikedetect.py", line 159, in template_match
    Amax = torch.max(Aa[iC2], 0)[0]
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.19 GiB. GPU 0 has a total capacity of 15.99 GiB of which 1.52 GiB is free. Of the allocated memory 8.70 GiB is allocated by PyTorch, and 3.19 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Version information:

python: 3.9.19
Kilosort version: 4.0.16
os: Windows 11 Home
CUDA toolkit: 11.8

RobertoDF · 2024-09-03T14:21:58Z

I get the same problem, did you try clear_cache=True? it helps sometimes, not in my case though.

HiroMiyawaki · 2024-09-04T02:31:36Z

I’ve tried the clear_cache option but encountered the same error.
Additionally, I confirmed that no other processes were occupying a significant amount (>1GB) of GPU memory.

jacobpennington · 2024-09-04T17:01:10Z

@HiroMiyawaki Can you please uploading kilosort4.log from the results directory so I can see more details? Also, if you're able to share the data with me that would help me debug this faster.

HiroMiyawaki · 2024-09-05T02:40:49Z

kilosort4.log
Here is the logfile.

I'm OK to share the data, which is ~87GB. Do you have a preferred method for transferring the data?

HiroMiyawaki · 2024-09-05T02:44:13Z

The log file has some garbage on the last thirds, please ignore them.

jacobpennington · 2024-09-05T22:02:03Z

@HiroMiyawaki Any kind of link you can post that I can download the data from is fine. Most people have been sending google drive or dropbox links. You can post it here if you're comfortable with that, or e-mail it to me at [email protected] if you don't want the link to be publicly visible.

HiroMiyawaki · 2024-09-06T05:42:16Z

@jacobpennington I've just send an e-mail to you.

Sara-Brooke · 2024-09-24T21:23:49Z

Hi, I'm getting a similar error when running KS4, was this cuda memory issue ever resolved?

jacobpennington · 2024-09-25T01:29:40Z

Still working on it. Can you please give some more details @Sara-Brooke, like attaching kilosort4.log?

Peyton-D · 2024-09-25T20:06:00Z

I'm having the same issue using a single NP2.0 in 2- and even 1-shank configurations. The 2-shank sorting attempt got to 39% complete during the "kilosort.spikedetect: Re-computing universal templates from data" phase before stopping due to CUDA out of memory error. The 1-shank attempt got to the "first clustering" phase before stopping. I should also mention that just loading the data into the kilosort gui takes up ~3gb of my 8 gb dedicated GPU memory.

Recording size: 90 min, Kilosort version: 4.0.17, "Clear PyTorch Cache" = True.

kilosort4.log

Sara-Brooke · 2024-09-26T17:04:58Z

I'm using NP2.0 in a four shank configuration, a recording of ~25 minutes, and I got the "cuda out of memory" error at the start of spike detection. I am trying to set up my spike sorting still so I don't have any successful runs to go off of unfortunately. I am using a 12GB GPU (GeForce rtx 3060), running ks4 from terminal in a conda environment on data collected in spikeGLX and preprocessed with CatGT.
Attaching the log file for review! Thank you so much for the help, I'll update this thread if I find anything out.
kilosort4.log

Python 3.9.19
Kilosort 4 (I'm not sure which version inside 4 but I installed it very recently so probably the latest)
OS: windows 11
Cuda: 11.8

Sara-Brooke · 2024-10-09T19:48:55Z

Okay I actually got mine to work! I had to manually find the most up-to-date nvidia driver on their website (the device manager lied to me, it was not actually up to date). Having the new driver on my GPU allowed me to install the newest cuda version (compatibility checked by typing nvidia-smi in the conda terminal).
Current driver:

Log File:
kilosort4_SB_successfulRun.log

So final (working) versions/equipment/packages:
windows 11
GeForce RTX 3060
NVIDIA driver 561.09
cuda 12.6
kilosort 4.0.18
python 3.9.2
torch 2.4.1

jacobpennington · 2024-10-10T03:16:24Z

Great, thanks for letting us know!

jacobpennington · 2024-10-24T20:01:26Z

Hi @HiroMiyawaki,

Can you please try sorting again with the latest version (v4.0.19)? There was a bug in the way template positions were generated for multi-shank probes, and fixing the bug reduced memory usage on your dataset by 75% for me.

HiroMiyawaki · 2024-10-29T07:54:56Z

Hello @jacobpennington,

KS 4.0.19 successfully processed a relatively short (~70 min) 4-shank recording, which was not possible with v4.0.16. However, for a longer (~390 min) 4-shank recording, KS 4.0.19 ran into an “out of memory” error (I’ve attached the log file).
kilosort4.log

I’m not sure whether this indicates that there is another bug or if a 390-min recording at 30 kHz is simply too large for my GPU (which has 16GB RAM). Note that the same data can be processed with KS 4.0.16 if each shank processed separarely.

RobertoDF · 2024-10-29T13:15:42Z

I had a similar error. You can try the version on the only open pull request to see if it fixes your problem too. You can see I am the author.

HiroMiyawaki · 2024-11-16T13:33:10Z

Hi @RobertoDF

It has been quite hectic for a while, but I finally had a chance to try your modification.

In short, it works!

Here are the details: I cloned the latest version a few days ago (the log indicates that it’s version 4.0.21.dev8+g44252a2.d20241115) and applied the modification as outlined in your pull request. The modified version successfully processed the ~390-minute, 4-shank dataset, and the results appear to be fine, at least in the Phy software.

I’ve attached the log file just in case.
kilosort4.log

Thanks a lot!

RobertoDF · 2024-11-16T16:56:53Z

Happy to hear that 🚀

jacobpennington · 2024-11-20T19:37:37Z

@HiroMiyawaki Would you be willing to share your data again, for the longer recording? Just to help me test some memory improvements on a dataset that I know is running into this problem.

STuoX · 2025-01-17T08:18:46Z

Hi,

My lab is using Kilosort to analyze Neuropixels 2.0 data. When running Kilosort 4, we noticed that there is no .mat file with 4 shanks when selecting the probe. It seems all .mat files are single shank. How did you deal with this issue? Did you create a .mat file yourself?

Thanks,
Tuo

Lathomas42 · 2025-01-22T21:57:15Z

I still have a recording that is quite long and cannot sort due to GPU memory issues. To attempt to remedy the memory issues, I am sorting this 64 channel file in two 32 chanel batches (one for each shank). And I have also upgraded from a 1080Ti to a 4070 with 16GB of memory, and this is still not enough. This is suprising to me given kilosort is fine with neuropixel data that has many many times more channels than this, the recording itself is 9hr at 30khz. Which ends up detecting ~400000000 spikes before final clustering, in which it crashes.

I was hoping there were some parameters i could change to reduce GPU load on the "Final Clustering" step. The ones that immediately jump out would be maybe 'cluster_downsampling'? Would this potentially help this step? Or are there other parameters to try. I have also implemented the changes above, to no avail.

@jacobpennington Happy to provide a bin file if that helps.

Attached is my log.

kilosort4.log

RobertoDF · 2025-01-23T09:14:36Z

I see that the error happens at vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1)

You should try this pull request #775 with:
pip install git+https://github.com/RobertoDF/Kilosort.git@Improved_memory_management_clustering_qr.kmeans_plusplus
It streamlines GPU memory distribution by avoiding unnecessary overload, and that particular line is one that is modified.

Lathomas42 · 2025-01-23T17:11:03Z

Thanks @RobertoDF, however I do have that pulled, and it does not resolve the issue.

-Logan

RobertoDF · 2025-01-23T17:51:59Z

Hi Logan,

In my version vexp = 2 * Xg @ Xc.T - (Xc**2).sum(1) is replaced. You can see it here kilosort/clustering_qr.py

So the error cannot happen at that line, can you check that the kilosort/clustering_qr.py file looks right and you are actually on my version?

jacobpennington · 2025-01-23T19:04:42Z

@Lathomas42 Thanks for the information. It does look like this is the same problem that I'm working on when I have time, so a bin file would be welcome. You can post a dropbox or google drive link here, or e-mail it to me at [email protected]. Or let me know if there's another sharing option you'd prefer, those just seem to be the easiest.

This was referenced Sep 3, 2024

clustering_qr.kmeans_plusplus explicit tensors deletion #773

Closed

clustering_qr.kmeans_plusplus explicit tensors deletion #774

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Memory Management Issue in Multi-Shank Configuration with Kilosort 4.0.16 #771

GPU Memory Management Issue in Multi-Shank Configuration with Kilosort 4.0.16 #771

HiroMiyawaki commented Sep 3, 2024

RobertoDF commented Sep 3, 2024

HiroMiyawaki commented Sep 4, 2024

jacobpennington commented Sep 4, 2024

HiroMiyawaki commented Sep 5, 2024

HiroMiyawaki commented Sep 5, 2024

jacobpennington commented Sep 5, 2024

HiroMiyawaki commented Sep 6, 2024

Sara-Brooke commented Sep 24, 2024

jacobpennington commented Sep 25, 2024

Peyton-D commented Sep 25, 2024 •

edited

Loading

Sara-Brooke commented Sep 26, 2024 •

edited

Loading

Sara-Brooke commented Oct 9, 2024 •

edited

Loading

jacobpennington commented Oct 10, 2024

jacobpennington commented Oct 24, 2024

HiroMiyawaki commented Oct 29, 2024

RobertoDF commented Oct 29, 2024

HiroMiyawaki commented Nov 16, 2024

RobertoDF commented Nov 16, 2024

jacobpennington commented Nov 20, 2024

STuoX commented Jan 17, 2025

Lathomas42 commented Jan 22, 2025

RobertoDF commented Jan 23, 2025 •

edited

Loading

Lathomas42 commented Jan 23, 2025

RobertoDF commented Jan 23, 2025 •

edited

Loading

jacobpennington commented Jan 23, 2025

GPU Memory Management Issue in Multi-Shank Configuration with Kilosort 4.0.16 #771

GPU Memory Management Issue in Multi-Shank Configuration with Kilosort 4.0.16 #771

Comments

HiroMiyawaki commented Sep 3, 2024

Describe the issue:

Reproduce the bug:

Error message:

Version information:

RobertoDF commented Sep 3, 2024

HiroMiyawaki commented Sep 4, 2024

jacobpennington commented Sep 4, 2024

HiroMiyawaki commented Sep 5, 2024

HiroMiyawaki commented Sep 5, 2024

jacobpennington commented Sep 5, 2024

HiroMiyawaki commented Sep 6, 2024

Sara-Brooke commented Sep 24, 2024

jacobpennington commented Sep 25, 2024

Peyton-D commented Sep 25, 2024 • edited Loading

Sara-Brooke commented Sep 26, 2024 • edited Loading

Sara-Brooke commented Oct 9, 2024 • edited Loading

jacobpennington commented Oct 10, 2024

jacobpennington commented Oct 24, 2024

HiroMiyawaki commented Oct 29, 2024

RobertoDF commented Oct 29, 2024

HiroMiyawaki commented Nov 16, 2024

RobertoDF commented Nov 16, 2024

jacobpennington commented Nov 20, 2024

STuoX commented Jan 17, 2025

Lathomas42 commented Jan 22, 2025

RobertoDF commented Jan 23, 2025 • edited Loading

Lathomas42 commented Jan 23, 2025

RobertoDF commented Jan 23, 2025 • edited Loading

jacobpennington commented Jan 23, 2025

Peyton-D commented Sep 25, 2024 •

edited

Loading

Sara-Brooke commented Sep 26, 2024 •

edited

Loading

Sara-Brooke commented Oct 9, 2024 •

edited

Loading

RobertoDF commented Jan 23, 2025 •

edited

Loading

RobertoDF commented Jan 23, 2025 •

edited

Loading