Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting bin size in percSpikesMissing for burst units #179

Open
ryanlash opened this issue Nov 8, 2024 · 30 comments
Open

Adjusting bin size in percSpikesMissing for burst units #179

ryanlash opened this issue Nov 8, 2024 · 30 comments

Comments

@ryanlash
Copy link

ryanlash commented Nov 8, 2024

Hi Julie!

I was wondering if you had any suggestions/recommendations for handling burst units with regards to the calculation of the percentage of missing spikes? Basically, most of our units are being mistakenly marked as MUA since amplitudes decay and the maxBin value is for the most part always equal to 1. I was thinking of adjusting to calculate bin size using a friedman-diaconis method but unsure of its utility in this scenario. Thoughts?
Screenshot 2024-11-08 122546
Screenshot 2024-11-08 123944

@Julie-Fabre
Copy link
Owner

Hi Ryan,

Thanks for your message! I can't really see the amplitudes in this plot (it looks like the plotting default is not working great in this case). Would you be able to zoom in a little on these two plots (or open the unit in phy and screenshot things there) ? I don't intuitively see why the % of missing spikes would fail in this way (over-estimating) for bursty units - it should usually under-estimate the percentage of missing spikes because the spike distribution is non-Gaussian and has a heavy tail toward the lower amplitudes.

image

@ryanlash
Copy link
Author

Hi Julie,

Here are some screenshots of the unit. And yes, this unit (like other burst units) are non-gaussian and does skew towards lower amplitudes. What I noticed is if the maxBin value is equal to 1 then there's a loop in the percMisssingSpikes that then marks the pMissing val=50 (which does make sense for non-burst units). I've found by decreasing bin size/increasing number of bins does resolve the issue.
Screenshot 2024-11-11 134225
Screenshot 2024-11-11 134203
Screenshot 2024-11-11 134002
Screenshot 2024-11-11 133914

@Julie-Fabre
Copy link
Owner

Interesting! What bin size / number of bins did you end up setting it to?

@ryanlash
Copy link
Author

Setting the bin size to 500 was working. However, I have actually figured out what the issue was. There were outlier spikes from adverse noise events happening for multiple sessions and multiple rodents at the end of each session across all channels affecting every unit (amplitude >1000). This then affected the distribution of bin locations. Because there were so few (<50 each), they weren't appearing in any of the windows in Phy. If you look at the original GUI output you can see them in the amplitude presence window. If anyone else runs into this issue, install the phy2 plugins here and remove outliers using the Mahalanobis distance plugin to avoid spending two weeks down a rabbit hole like me. Science :')

@Julie-Fabre
Copy link
Owner

Julie-Fabre commented Nov 13, 2024

Yes, good catch. I can see the spikes aren't displayed properly in bombcell because of these outliers (and then the histogram bins are not the correct size/ values to capture any meaningful amplitude variations to fit a Gaussian to).
One solution for you is to remove the outliers in phy, like you said (and thanks for pointing towards those phy plugins)
It would be nice to have a solution in bombcell though. I am not sure what the best solution would be- either checking if there are extreme outliers (and removing them) or defining the histogram bins in a 'smarter' way (maybe using the Freedman-Diaconis rule). I will have a think/ play around and see which solution works best. I'll leave this github issue open until then.
Thanks again for bringing this up!

@Julie-Fabre
Copy link
Owner

Hi Ryan,

I am playing around with how to get bombcell dto deal with amplitude outliers. I am curious to see if it works on your data (and also to look a bit more into what outliers look like). Would you be able to send the kilosort output (.npy) files to me? You can send them using email: [email protected]

@ryanlash
Copy link
Author

Hi Julie!

Just shared a folder with you let me know if you didn't get it.

@Julie-Fabre
Copy link
Owner

Hi Ryan,

Thank you very much for sending the files over! I have now implemented a check in bombcell to detect (and ignore) extreme amplitude outliers (see the plots below - anything above the purple line is detected as an "extreme outlier" and is not used to compute the percentage of spikes missing). In the future, I will refine this and have a separate check so these kinds of spikes are removed entirely from the dataset, but for now this will do :)
image
image

@ryanlash
Copy link
Author

ryanlash commented Dec 2, 2024

Hi Julie!

I am having trouble replicating these results and am seeing a lot of units with 50% missing spikes still. I've tried debugging and am able to get that plot to show for the first unit but run into this error when I try to go to the next unit. Thoughts?
image

@Julie-Fabre
Copy link
Owner

Hmm that's odd. I just re-ran bombcell on your data (using all possible combinations of parameters that might have an effect) and I couldn't reproduce the bug. This is the distribution of percentage of spikes missing I see - does your look different?
image
It might be worth double checking you've updated all bombcell files fully (the easiest is just re-downloading / cloning so that everything is sure to be up-to-date)

@Julie-Fabre
Copy link
Owner

If you still get the bug, could you also send me your parameter file (or the script or the parquet file, whichever) to see if I can reproduce it that way?

@ryanlash
Copy link
Author

ryanlash commented Dec 2, 2024

So that is the other thing, I am currently unable to get the other summary plots to appear along and the gui sometimes won't appear at all (which maybe a kilosort issue haven't fully looked into it yet.)
`function param = qualityParamValues(ephysMetaDir, rawFile, ephysKilosortPath, gain_to_uV, kilosortVersion)
% JF, Load a parameter structure defining extraction and
% classification parameters
% ------
% Inputs
% ------
% ephysMetaDir: dir() structure of the path to your .meta or .oebin meta
% file
% rawFile: character array defining the path where your uncompressed raw
% ephys data is
% ------
% Outputs
% ------
% param: matlab structure defining extraction and
% classification parameters (see bc_qualityParamValues for required fields
% and suggested starting values)
%
if nargin < 5
kilosortVersion = 4;
end
param = struct; %initialize structure

%% calculating quality metrics parameters
% plotting parameters
param.plotDetails = 0; % generates a lot of plots,
% mainly good if you running through the code line by line to check things,
% to debug, or to get nice plots for a presentation
param.plotGlobal = 1; % plot summary of quality metrics
param.verbose = 1; % update user on progress
param.reextractRaw = 0; % re extract raw waveforms or not

% saving parameters
param.saveAsTSV = 1; % additionally save outputs in .tsv file - this is
% useful if you want to use phy after bombcell: each quality metric value
% will appear as a column in the Cluster view
param.unitType_for_phy = 1; % whether to save the output of unitType in .tsv file for phy
if nargin < 3
warning('no ephys kilosort path defined in bc_qualityParamValues, will save output tsv file in the savePath location')
else
param.ephysKilosortPath = ephysKilosortPath;
end
param.saveMatFileForGUI = 1; % save certain outputs at .mat file - useful for GUI

% duplicate spikes parameters
param.removeDuplicateSpikes = 1;
param.duplicateSpikeWindow_s = 0.00001; % in seconds
param.saveSpikes_withoutDuplicates = 1;
param.recomputeDuplicateSpikes = 0;

% amplitude / raw waveform parameters
param.detrendWaveform = 1; % If this is set to 1, each raw extracted spike is
% detrended (we remove the best straight-fit line from the spike)
% using MATLAB's builtin function detrend.
param.nRawSpikesToExtract = 100; % how many raw spikes to extract for each unit
param.saveMultipleRaw = 0; % If you wish to save the nRawSpikesToExtract as well,
% currently needed if you want to run unit match https://github.com/EnnyvanBeest/UnitMatch
% to track chronic cells over days after this
param.decompressData = 0; % whether to decompress .cbin ephys data
if kilosortVersion == 4
param.spikeWidth = 61; % width in samples
else
param.spikeWidth = 82; % width in samples
end
if strcmp(rawFile, "NaN")
param.extractRaw = 0;
else
param.extractRaw = 1; %whether to extract raw waveforms or not
end
param.probeType = 1; % if you are using spikeGLX and your meta file does
% not contain information about your probe type for some reason
% specify it here: '1' for 1.0 (3Bs) and '2' for 2.0 (single or 4-shanks)
% For additional probe types, make a pull request with more
% information. If your spikeGLX meta file contains information about your probe
% type, or if you are using open ephys, this paramater wil be ignored.
param.computeSpatialDecay = 0;

% signal to noise ratio
if kilosortVersion == 4
param.waveformBaselineNoiseWindow = 10; %time in samples at beginning of times
% extracted to computer the mean raw waveform - this needs to be before the
% waveform starts
else
param.waveformBaselineNoiseWindow = 20; %time in samples at beginning of times
% extracted to computer the mean raw waveform - this needs to be before the
% waveform starts
end

% refractory period parameters
param.tauR_valuesMin = 1/1000; % refractory period time (s), usually 0.0020.
% If this value is different than param.tauR_valuesMax, bombcell will
% estimate the tauR value taking possible values between :
% param.tauR_valuesMin:param.tauR_valuesStep:param.tauR_valuesMax
param.tauR_valuesStep = 0.5/1000; % refractory period time (s) steps. Only
% used if param.tauR_valuesMin is different from param.tauR_valuesMax
param.tauR_valuesMax = 1/1000; % refractory period time (s), usually 0.0020
param.tauC = 0.1/1000; % censored period time (s) - this is to prevent duplicate spikes
param.hillOrLlobetMethod = 1; % 1 to use Hill et al method; 2 to use Llobet et al method

% percentage spikes missing parameters
param.computeTimeChunks = 0; % compute fraction refractory period violations
% and percent spikes missing for different time chunks
param.deltaTimeChunk = 360; %time in seconds

% presence ratio
param.presenceRatioBinSize = 60; % in seconds

% drift estimate
param.driftBinSize = 60; % in seconds
param.computeDrift = 0; % whether to compute each units drift. this is a
% critically slow step that takes around 2seconds per unit

% waveform parameters
if kilosortVersion == 4
param.waveformBaselineWindowStart = 1;
param.waveformBaselineWindowStop = 11; % in samples
else
param.waveformBaselineWindowStart = 20;
param.waveformBaselineWindowStop = 30; % in samples
end
param.minThreshDetectPeaksTroughs = 0.1; % this is multiplied by the max value
% in a units waveform to give the minimum prominence to detect peaks using
% matlab's findpeaks function.
param.normalizeSpDecay = 1; % whether to normalize spatial decay points relative to
% maximum - this makes the spatrial decay slop calculation more invariant to the
% spike-sorting algorithm used
param.spDecayLinFit = 1; % if false, use an exponential fit

% recording parameters
param.ephys_sample_rate = 20000; % samples per second
param.nChannels = 64; %number of recorded channels (including any sync channels)
% recorded in the raw data. This is usually 384 or 385 for neuropixels
% recordings
param.nSyncChannels = 1;
if ~isempty(ephysMetaDir)
param.ephysMetaFile = [ephysMetaDir.folder, filesep, ephysMetaDir.name];
param.gain_to_uV = NaN;
else
param.ephysMetaFile = 'NaN';
param.gain_to_uV = gain_to_uV;
end
param.rawFile = rawFile;

% distance metric parameters
param.computeDistanceMetrics = 0; % whether to compute distance metrics - this can be time consuming
param.nChannelsIsoDist = 4; % number of nearby channels to use in distance metric computation

%% classifying units into good/mua/noise parameters
% whether to classify non-somatic units
param.splitGoodAndMua_NonSomatic = 0;

% waveform
param.maxNPeaks = 2; % maximum number of peaks
param.maxNTroughs = 1; % maximum number of troughs
param.somatic = 1; % keep only somatic units, and reject non-somatic ones
param.minWvDuration = 100; % in us
param.maxWvDuration = 1150; % in us
param.minSpatialDecaySlope = -0.0008; % in a.u./um
param.minSpatialDecaySlopeExp = 0.01; % in a.u./um
param.maxSpatialDecaySlopeExp = 0.12; % in a.u./um
param.maxWvBaselineFraction = 0.3; % maximum absolute value in waveform baseline
% should not exceed this fraction of the waveform's abolute peak value
param.firstPeakRatio = 3; % if units have an initial peak before the trough,
% it must be at least firstPeakRatio times larger than the peak after the trough to qualify as a non-somatic unit.
param.minTroughToPeakRatio = 0.8; % peak must be less
param.minWidthFirstPeak = 4; % in samples
param.minMainPeakToTroughRatio = 5; % trough should be min 5 x bigger than 1rst peak to count as non-somatic
param.minWidthMainTrough = 5; % in samples

% distance metrics
param.isoDmin = 20; % minimum isolation distance value
param.lratioMax = 0.1; % maximum l-ratio value
param.ssMin = NaN; % minimum silhouette score

% other classification params
param.minAmplitude = 20; % in uV
param.maxRPVviolations = 0.1; % fraction
param.maxPercSpikesMissing = 20; % in percentage
param.minNumSpikes = 300; % number of spikes
param.maxDrift = 100; % in micrometers
param.minPresenceRatio = 0.7; % fraction
param.minSNR = 1;

end
`

@Julie-Fabre
Copy link
Owner

Thanks for sending that over, Ryan! Running bombcell with that file, I still don't get any errors. Could you please double check you've updated all bombcell files fully (the easiest is just re-downloading / cloning so that everything is sure to be up-to-date)

@ryanlash
Copy link
Author

ryanlash commented Dec 2, 2024

So I've been trying to run it using the getting started and keep running into this error:
image

@Julie-Fabre
Copy link
Owner

Hi again, I sadly can't reproduce any of these bugs. Can you confirm that you've updated all bombcell files? It might be easiest to completely delete your current bombcell folder re-download or re-clone it.

@ryanlash
Copy link
Author

ryanlash commented Dec 3, 2024

Hi Julie! So I was able to resolve these issues, however, the processing speed has slowed down quite a bit. I've tried using both the live editor and copying over the bombcell_pipeline script but both seem to be processing quite a bit slower. Is this expected with the time chunking in the percSpikesMissing script?

@Julie-Fabre
Copy link
Owner

Julie-Fabre commented Dec 9, 2024

Sorry, I missed your reply. That's odd, I don't see any speed changed on my end.
Have you changed any paramaters by any chance? Typically, if you use param.computeDrift=1, param.computeDistanceMetrics=1 or param.hillOrLlobetMethod=1, the code is slower. The code is also slower on the first run if you are extracting raw waveforms (param.extractRaw).

@ryanlash
Copy link
Author

ryanlash commented Dec 9, 2024

Hi Julie!

No worries, I'm also working on setting up UnitMatch so I've got plenty to work on! I reverted back to the previous version I was working with and was able to get it to work (<1min) with the only adjustment being the percMissingSpikes script. I'm testing on the most recent version with the only adjustment being the qualityParams file and ensured that all of these are set to zero and it's taking ~7 minutes for the same file. Despite setting the loadRawTraces to 0, it is still loading raw traces in the GUI. However, this version of the GUI does appear without any issue despite being on a landscape monitor and will be used for publication purposes :)

@Julie-Fabre
Copy link
Owner

Julie-Fabre commented Dec 9, 2024

Ah interesting that it has slowed down your machine. Sorry about that! In that case, the culprit must be an addition I made to function to prevent extreme outliers. If you comment out the current lines 46-53 does that speed things up again? If it does I will refactor that code / make an option to not slow down everything.

About the loadRawTraces, this was to load chunks of raw data in (that I've now disabled because it was slow and clunky). You can safely ignore that and thanks for flagging I've cleaned that up now :)

@Julie-Fabre
Copy link
Owner

Just to add - if you do want to disable showing the raw waveforms, you can set param.extractRaw = 0 before loading the GUI. Glad you like the GUI :)

@ryanlash
Copy link
Author

ryanlash commented Dec 9, 2024

I'm still experiencing anywhere from 7-12 minutes of processing time even with the updates. I was unable to comment out 46-51 in the percMissingSpikes without disrupting the rest of the script but I am able to integrate the updated script into the previous version I used which seems to work at recognizing the outliers and ignoring them while not sacrificing processing speed, so I don't believe that is what's slowing it down. I also noticed a couple of bugs in the new version namely that some of the variables are going unrecognized during the GUI visualization step (namely minPeakTroughRatio_nonSomatic). I'll be playing with it some more this week so I'll let you know if there's anything else I can find

@Julie-Fabre
Copy link
Owner

Julie-Fabre commented Dec 10, 2024

I also noticed a couple of bugs in the new version namely that some of the variables are going unrecognized during the GUI visualization step (namely minPeakTroughRatio_nonSomatic). I'll be playing with it some more this week so I'll let you know if there's anything else I can find

Sorry about that bug - should be fixed now

About the slow speed - could you run the main function with the MATLAB profiler? So run this:

profile on
[qMetric, unitType] = bc.qm.runAllQualityMetrics(param, spikeTimes_samples, spikeClusters, ...
        templateWaveforms, templateAmplitudes, pcFeatures, pcFeatureIdx, channelPositions, savePath);`
profile viewer

and show me the output.

@ryanlash
Copy link
Author

No need to apologize this has already saved me eons of script writing and analysis 😄 I started the profiler this morning but has surpassed the two-hour mark, so I'll start it again later today and let it run overnight.

@ryanlash
Copy link
Author

It crashed at some point overnight and here are the errors I got:

Missing param fields filled in with default values
{'maxScndPeakToTroughRatio_noise' }
{'maxMainPeakToTroughRatio_nonSomatic'}
{'minWidthFirstPeak_nonSomatic' }
{'minWidthMainTrough_nonSomatic' }
{'minTroughToPeak2Ratio_nonSomatic' }
{'maxPeak1ToPeak2Ratio_nonSomatic' }
The logical indices contain a true value outside of the array bounds.

Error in [bc.qm.removeDuplicateSpikes](matlab:matlab.lang.internal.introspective.errorDocCallback('bc.qm.removeDuplicateSpikes', 'C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\removeDuplicateSpikes.m', 71)) ([line 71](matlab: opentoline('C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\removeDuplicateSpikes.m',71,0)))
templateAmplitudes = templateAmplitudes(~duplicateSpikes_idx);

Error in [bc.qm.runAllQualityMetrics](matlab:matlab.lang.internal.introspective.errorDocCallback('bc.qm.runAllQualityMetrics', 'C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\runAllQualityMetrics.m', 95)) ([line 95](matlab: opentoline('C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\runAllQualityMetrics.m',95,0)))
bc.qm.removeDuplicateSpikes(spikeTimes_samples, spikeClusters, templateAmplitudes, ...

@Julie-Fabre
Copy link
Owner

Julie-Fabre commented Dec 12, 2024

Hi Ryan, Is this with the same dataset as above?

  • If it is: it runs fine on my end - could you make sure you haven't inadvertently overwritten/changed any of bombcell's functions?
  • If it is not: could you send your dataset over?

@ryanlash
Copy link
Author

Hi Julie! Check your email, I just sent over a couple of versions of bombcell and a sample file. Let me know if you have any issues getting access to the folders :)

@Julie-Fabre
Copy link
Owner

Julie-Fabre commented Dec 13, 2024

Great, thanks! I'll take a look at all of this Monday morning

@Julie-Fabre
Copy link
Owner

Hi Ryan, thanks for sending those over! I have taken a look and your dataset runs fine for me. I have not tried running with your bombcell versions. Could please double-check that you have the latest bombcell - I would recommend copying your parameter file somewhere safe and then completely deleting the bombcell folder and re-installing it. 

@Julie-Fabre
Copy link
Owner

Hi Ryan how are things going? Let me know if there is anything I can help with :)

@ryanlash
Copy link
Author

Hi Julie!

I was able to get bombcell to run correctly again and it works great! Currently, I'm trying to figure out how to run UnitMatch(and by proxy Bombcell) while incorporating the info.rhd meta file that is spit out when recording from Intan amplifiers. I have code to read in .rhd files, however, this likely will be more headache than it's worth trying to incorporate it. My initial thoughts are to just figure out a way to replace instances of reading the cbin files with rhd files or possibly creating copies of the info.rhd files and replace them with the format that can be read in by bombcell. Any thoughts/suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants