Adjusting bin size in percSpikesMissing for burst units #179

ryanlash · 2024-11-08T17:40:15Z

Hi Julie!

I was wondering if you had any suggestions/recommendations for handling burst units with regards to the calculation of the percentage of missing spikes? Basically, most of our units are being mistakenly marked as MUA since amplitudes decay and the maxBin value is for the most part always equal to 1. I was thinking of adjusting to calculate bin size using a friedman-diaconis method but unsure of its utility in this scenario. Thoughts?

Julie-Fabre · 2024-11-09T00:41:00Z

Hi Ryan,

Thanks for your message! I can't really see the amplitudes in this plot (it looks like the plotting default is not working great in this case). Would you be able to zoom in a little on these two plots (or open the unit in phy and screenshot things there) ? I don't intuitively see why the % of missing spikes would fail in this way (over-estimating) for bursty units - it should usually under-estimate the percentage of missing spikes because the spike distribution is non-Gaussian and has a heavy tail toward the lower amplitudes.

ryanlash · 2024-11-11T19:04:44Z

Hi Julie,

Here are some screenshots of the unit. And yes, this unit (like other burst units) are non-gaussian and does skew towards lower amplitudes. What I noticed is if the maxBin value is equal to 1 then there's a loop in the percMisssingSpikes that then marks the pMissing val=50 (which does make sense for non-burst units). I've found by decreasing bin size/increasing number of bins does resolve the issue.

Julie-Fabre · 2024-11-12T11:18:50Z

Interesting! What bin size / number of bins did you end up setting it to?

ryanlash · 2024-11-12T16:45:02Z

Setting the bin size to 500 was working. However, I have actually figured out what the issue was. There were outlier spikes from adverse noise events happening for multiple sessions and multiple rodents at the end of each session across all channels affecting every unit (amplitude >1000). This then affected the distribution of bin locations. Because there were so few (<50 each), they weren't appearing in any of the windows in Phy. If you look at the original GUI output you can see them in the amplitude presence window. If anyone else runs into this issue, install the phy2 plugins here and remove outliers using the Mahalanobis distance plugin to avoid spending two weeks down a rabbit hole like me. Science :')

Julie-Fabre · 2024-11-13T14:56:27Z

Yes, good catch. I can see the spikes aren't displayed properly in bombcell because of these outliers (and then the histogram bins are not the correct size/ values to capture any meaningful amplitude variations to fit a Gaussian to).
One solution for you is to remove the outliers in phy, like you said (and thanks for pointing towards those phy plugins)
It would be nice to have a solution in bombcell though. I am not sure what the best solution would be- either checking if there are extreme outliers (and removing them) or defining the histogram bins in a 'smarter' way (maybe using the Freedman-Diaconis rule). I will have a think/ play around and see which solution works best. I'll leave this github issue open until then.
Thanks again for bringing this up!

Julie-Fabre · 2024-11-25T16:47:02Z

Hi Ryan,

I am playing around with how to get bombcell dto deal with amplitude outliers. I am curious to see if it works on your data (and also to look a bit more into what outliers look like). Would you be able to send the kilosort output (.npy) files to me? You can send them using email: [email protected]

ryanlash · 2024-11-25T18:15:56Z

Hi Julie!

Just shared a folder with you let me know if you didn't get it.

Julie-Fabre · 2024-11-28T12:02:24Z

Hi Ryan,

Thank you very much for sending the files over! I have now implemented a check in bombcell to detect (and ignore) extreme amplitude outliers (see the plots below - anything above the purple line is detected as an "extreme outlier" and is not used to compute the percentage of spikes missing). In the future, I will refine this and have a separate check so these kinds of spikes are removed entirely from the dataset, but for now this will do :)

ryanlash · 2024-12-02T20:18:51Z

Hi Julie!

I am having trouble replicating these results and am seeing a lot of units with 50% missing spikes still. I've tried debugging and am able to get that plot to show for the first unit but run into this error when I try to go to the next unit. Thoughts?

Julie-Fabre · 2024-12-02T20:34:18Z

Hmm that's odd. I just re-ran bombcell on your data (using all possible combinations of parameters that might have an effect) and I couldn't reproduce the bug. This is the distribution of percentage of spikes missing I see - does your look different?

It might be worth double checking you've updated all bombcell files fully (the easiest is just re-downloading / cloning so that everything is sure to be up-to-date)

Julie-Fabre · 2024-12-02T20:35:58Z

If you still get the bug, could you also send me your parameter file (or the script or the parquet file, whichever) to see if I can reproduce it that way?

ryanlash · 2024-12-02T20:39:12Z

So that is the other thing, I am currently unable to get the other summary plots to appear along and the gui sometimes won't appear at all (which maybe a kilosort issue haven't fully looked into it yet.)
`function param = qualityParamValues(ephysMetaDir, rawFile, ephysKilosortPath, gain_to_uV, kilosortVersion)
% JF, Load a parameter structure defining extraction and
% classification parameters
% ------
% Inputs
% ------
% ephysMetaDir: dir() structure of the path to your .meta or .oebin meta
% file
% rawFile: character array defining the path where your uncompressed raw
% ephys data is
% ------
% Outputs
% ------
% param: matlab structure defining extraction and
% classification parameters (see bc_qualityParamValues for required fields
% and suggested starting values)
%
if nargin < 5
kilosortVersion = 4;
end
param = struct; %initialize structure

%% calculating quality metrics parameters
% plotting parameters
param.plotDetails = 0; % generates a lot of plots,
% mainly good if you running through the code line by line to check things,
% to debug, or to get nice plots for a presentation
param.plotGlobal = 1; % plot summary of quality metrics
param.verbose = 1; % update user on progress
param.reextractRaw = 0; % re extract raw waveforms or not

% saving parameters
param.saveAsTSV = 1; % additionally save outputs in .tsv file - this is
% useful if you want to use phy after bombcell: each quality metric value
% will appear as a column in the Cluster view
param.unitType_for_phy = 1; % whether to save the output of unitType in .tsv file for phy
if nargin < 3
warning('no ephys kilosort path defined in bc_qualityParamValues, will save output tsv file in the savePath location')
else
param.ephysKilosortPath = ephysKilosortPath;
end
param.saveMatFileForGUI = 1; % save certain outputs at .mat file - useful for GUI

% duplicate spikes parameters
param.removeDuplicateSpikes = 1;
param.duplicateSpikeWindow_s = 0.00001; % in seconds
param.saveSpikes_withoutDuplicates = 1;
param.recomputeDuplicateSpikes = 0;

% amplitude / raw waveform parameters
param.detrendWaveform = 1; % If this is set to 1, each raw extracted spike is
% detrended (we remove the best straight-fit line from the spike)
% using MATLAB's builtin function detrend.
param.nRawSpikesToExtract = 100; % how many raw spikes to extract for each unit
param.saveMultipleRaw = 0; % If you wish to save the nRawSpikesToExtract as well,
% currently needed if you want to run unit match https://github.com/EnnyvanBeest/UnitMatch
% to track chronic cells over days after this
param.decompressData = 0; % whether to decompress .cbin ephys data
if kilosortVersion == 4
param.spikeWidth = 61; % width in samples
else
param.spikeWidth = 82; % width in samples
end
if strcmp(rawFile, "NaN")
param.extractRaw = 0;
else
param.extractRaw = 1; %whether to extract raw waveforms or not
end
param.probeType = 1; % if you are using spikeGLX and your meta file does
% not contain information about your probe type for some reason
% specify it here: '1' for 1.0 (3Bs) and '2' for 2.0 (single or 4-shanks)
% For additional probe types, make a pull request with more
% information. If your spikeGLX meta file contains information about your probe
% type, or if you are using open ephys, this paramater wil be ignored.
param.computeSpatialDecay = 0;

% signal to noise ratio
if kilosortVersion == 4
param.waveformBaselineNoiseWindow = 10; %time in samples at beginning of times
% extracted to computer the mean raw waveform - this needs to be before the
% waveform starts
else
param.waveformBaselineNoiseWindow = 20; %time in samples at beginning of times
% extracted to computer the mean raw waveform - this needs to be before the
% waveform starts
end

% refractory period parameters
param.tauR_valuesMin = 1/1000; % refractory period time (s), usually 0.0020.
% If this value is different than param.tauR_valuesMax, bombcell will
% estimate the tauR value taking possible values between :
% param.tauR_valuesMin:param.tauR_valuesStep:param.tauR_valuesMax
param.tauR_valuesStep = 0.5/1000; % refractory period time (s) steps. Only
% used if param.tauR_valuesMin is different from param.tauR_valuesMax
param.tauR_valuesMax = 1/1000; % refractory period time (s), usually 0.0020
param.tauC = 0.1/1000; % censored period time (s) - this is to prevent duplicate spikes
param.hillOrLlobetMethod = 1; % 1 to use Hill et al method; 2 to use Llobet et al method

% percentage spikes missing parameters
param.computeTimeChunks = 0; % compute fraction refractory period violations
% and percent spikes missing for different time chunks
param.deltaTimeChunk = 360; %time in seconds

% presence ratio
param.presenceRatioBinSize = 60; % in seconds

% drift estimate
param.driftBinSize = 60; % in seconds
param.computeDrift = 0; % whether to compute each units drift. this is a
% critically slow step that takes around 2seconds per unit

% waveform parameters
if kilosortVersion == 4
param.waveformBaselineWindowStart = 1;
param.waveformBaselineWindowStop = 11; % in samples
else
param.waveformBaselineWindowStart = 20;
param.waveformBaselineWindowStop = 30; % in samples
end
param.minThreshDetectPeaksTroughs = 0.1; % this is multiplied by the max value
% in a units waveform to give the minimum prominence to detect peaks using
% matlab's findpeaks function.
param.normalizeSpDecay = 1; % whether to normalize spatial decay points relative to
% maximum - this makes the spatrial decay slop calculation more invariant to the
% spike-sorting algorithm used
param.spDecayLinFit = 1; % if false, use an exponential fit

% recording parameters
param.ephys_sample_rate = 20000; % samples per second
param.nChannels = 64; %number of recorded channels (including any sync channels)
% recorded in the raw data. This is usually 384 or 385 for neuropixels
% recordings
param.nSyncChannels = 1;
if ~isempty(ephysMetaDir)
param.ephysMetaFile = [ephysMetaDir.folder, filesep, ephysMetaDir.name];
param.gain_to_uV = NaN;
else
param.ephysMetaFile = 'NaN';
param.gain_to_uV = gain_to_uV;
end
param.rawFile = rawFile;

% distance metric parameters
param.computeDistanceMetrics = 0; % whether to compute distance metrics - this can be time consuming
param.nChannelsIsoDist = 4; % number of nearby channels to use in distance metric computation

%% classifying units into good/mua/noise parameters
% whether to classify non-somatic units
param.splitGoodAndMua_NonSomatic = 0;

% waveform
param.maxNPeaks = 2; % maximum number of peaks
param.maxNTroughs = 1; % maximum number of troughs
param.somatic = 1; % keep only somatic units, and reject non-somatic ones
param.minWvDuration = 100; % in us
param.maxWvDuration = 1150; % in us
param.minSpatialDecaySlope = -0.0008; % in a.u./um
param.minSpatialDecaySlopeExp = 0.01; % in a.u./um
param.maxSpatialDecaySlopeExp = 0.12; % in a.u./um
param.maxWvBaselineFraction = 0.3; % maximum absolute value in waveform baseline
% should not exceed this fraction of the waveform's abolute peak value
param.firstPeakRatio = 3; % if units have an initial peak before the trough,
% it must be at least firstPeakRatio times larger than the peak after the trough to qualify as a non-somatic unit.
param.minTroughToPeakRatio = 0.8; % peak must be less
param.minWidthFirstPeak = 4; % in samples
param.minMainPeakToTroughRatio = 5; % trough should be min 5 x bigger than 1rst peak to count as non-somatic
param.minWidthMainTrough = 5; % in samples

% distance metrics
param.isoDmin = 20; % minimum isolation distance value
param.lratioMax = 0.1; % maximum l-ratio value
param.ssMin = NaN; % minimum silhouette score

% other classification params
param.minAmplitude = 20; % in uV
param.maxRPVviolations = 0.1; % fraction
param.maxPercSpikesMissing = 20; % in percentage
param.minNumSpikes = 300; % number of spikes
param.maxDrift = 100; % in micrometers
param.minPresenceRatio = 0.7; % fraction
param.minSNR = 1;

end
`

Julie-Fabre · 2024-12-02T20:41:12Z

Thanks for sending that over, Ryan! Running bombcell with that file, I still don't get any errors. Could you please double check you've updated all bombcell files fully (the easiest is just re-downloading / cloning so that everything is sure to be up-to-date)

ryanlash · 2024-12-02T21:30:34Z

So I've been trying to run it using the getting started and keep running into this error:

Julie-Fabre · 2024-12-02T22:24:43Z

Hi again, I sadly can't reproduce any of these bugs. Can you confirm that you've updated all bombcell files? It might be easiest to completely delete your current bombcell folder re-download or re-clone it.

ryanlash · 2024-12-03T20:19:42Z

Hi Julie! So I was able to resolve these issues, however, the processing speed has slowed down quite a bit. I've tried using both the live editor and copying over the bombcell_pipeline script but both seem to be processing quite a bit slower. Is this expected with the time chunking in the percSpikesMissing script?

Julie-Fabre · 2024-12-09T16:42:13Z

Sorry, I missed your reply. That's odd, I don't see any speed changed on my end.
Have you changed any paramaters by any chance? Typically, if you use param.computeDrift=1, param.computeDistanceMetrics=1 or param.hillOrLlobetMethod=1, the code is slower. The code is also slower on the first run if you are extracting raw waveforms (param.extractRaw).

ryanlash · 2024-12-09T17:31:08Z

Hi Julie!

No worries, I'm also working on setting up UnitMatch so I've got plenty to work on! I reverted back to the previous version I was working with and was able to get it to work (<1min) with the only adjustment being the percMissingSpikes script. I'm testing on the most recent version with the only adjustment being the qualityParams file and ensured that all of these are set to zero and it's taking ~7 minutes for the same file. Despite setting the loadRawTraces to 0, it is still loading raw traces in the GUI. However, this version of the GUI does appear without any issue despite being on a landscape monitor and will be used for publication purposes :)

Julie-Fabre · 2024-12-09T18:04:50Z

Ah interesting that it has slowed down your machine. Sorry about that! In that case, the culprit must be an addition I made to function to prevent extreme outliers. If you comment out the current lines 46-53 does that speed things up again? If it does I will refactor that code / make an option to not slow down everything.

About the loadRawTraces, this was to load chunks of raw data in (that I've now disabled because it was slow and clunky). You can safely ignore that and thanks for flagging I've cleaned that up now :)

Julie-Fabre · 2024-12-09T18:18:00Z

Just to add - if you do want to disable showing the raw waveforms, you can set param.extractRaw = 0 before loading the GUI. Glad you like the GUI :)

ryanlash · 2024-12-09T22:01:54Z

I'm still experiencing anywhere from 7-12 minutes of processing time even with the updates. I was unable to comment out 46-51 in the percMissingSpikes without disrupting the rest of the script but I am able to integrate the updated script into the previous version I used which seems to work at recognizing the outliers and ignoring them while not sacrificing processing speed, so I don't believe that is what's slowing it down. I also noticed a couple of bugs in the new version namely that some of the variables are going unrecognized during the GUI visualization step (namely minPeakTroughRatio_nonSomatic). I'll be playing with it some more this week so I'll let you know if there's anything else I can find

Julie-Fabre · 2024-12-10T10:38:16Z

I also noticed a couple of bugs in the new version namely that some of the variables are going unrecognized during the GUI visualization step (namely minPeakTroughRatio_nonSomatic). I'll be playing with it some more this week so I'll let you know if there's anything else I can find

Sorry about that bug - should be fixed now

About the slow speed - could you run the main function with the MATLAB profiler? So run this:

profile on
[qMetric, unitType] = bc.qm.runAllQualityMetrics(param, spikeTimes_samples, spikeClusters, ...
        templateWaveforms, templateAmplitudes, pcFeatures, pcFeatureIdx, channelPositions, savePath);`
profile viewer

and show me the output.

ryanlash · 2024-12-10T16:54:02Z

No need to apologize this has already saved me eons of script writing and analysis 😄 I started the profiler this morning but has surpassed the two-hour mark, so I'll start it again later today and let it run overnight.

ryanlash · 2024-12-11T15:35:31Z

It crashed at some point overnight and here are the errors I got:

Missing param fields filled in with default values
{'maxScndPeakToTroughRatio_noise' }
{'maxMainPeakToTroughRatio_nonSomatic'}
{'minWidthFirstPeak_nonSomatic' }
{'minWidthMainTrough_nonSomatic' }
{'minTroughToPeak2Ratio_nonSomatic' }
{'maxPeak1ToPeak2Ratio_nonSomatic' }
The logical indices contain a true value outside of the array bounds.

Error in [bc.qm.removeDuplicateSpikes](matlab:matlab.lang.internal.introspective.errorDocCallback('bc.qm.removeDuplicateSpikes', 'C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\removeDuplicateSpikes.m', 71)) ([line 71](matlab: opentoline('C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\removeDuplicateSpikes.m',71,0)))
templateAmplitudes = templateAmplitudes(~duplicateSpikes_idx);

Error in [bc.qm.runAllQualityMetrics](matlab:matlab.lang.internal.introspective.errorDocCallback('bc.qm.runAllQualityMetrics', 'C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\runAllQualityMetrics.m', 95)) ([line 95](matlab: opentoline('C:\Users\rylash\bombcell-main4\bombcell-main+bc+qm\runAllQualityMetrics.m',95,0)))
bc.qm.removeDuplicateSpikes(spikeTimes_samples, spikeClusters, templateAmplitudes, ...

Julie-Fabre · 2024-12-12T13:22:47Z

Hi Ryan, Is this with the same dataset as above?

If it is: it runs fine on my end - could you make sure you haven't inadvertently overwritten/changed any of bombcell's functions?
If it is not: could you send your dataset over?

ryanlash · 2024-12-12T19:59:36Z

Hi Julie! Check your email, I just sent over a couple of versions of bombcell and a sample file. Let me know if you have any issues getting access to the folders :)

Julie-Fabre · 2024-12-13T14:10:49Z

Great, thanks! I'll take a look at all of this Monday morning

Julie-Fabre · 2024-12-16T13:15:27Z

Hi Ryan, thanks for sending those over! I have taken a look and your dataset runs fine for me. I have not tried running with your bombcell versions. Could please double-check that you have the latest bombcell - I would recommend copying your parameter file somewhere safe and then completely deleting the bombcell folder and re-installing it.

Julie-Fabre · 2024-12-22T12:27:11Z

Hi Ryan how are things going? Let me know if there is anything I can help with :)

ryanlash · 2024-12-23T19:58:31Z

Hi Julie!

I was able to get bombcell to run correctly again and it works great! Currently, I'm trying to figure out how to run UnitMatch(and by proxy Bombcell) while incorporating the info.rhd meta file that is spit out when recording from Intan amplifiers. I have code to read in .rhd files, however, this likely will be more headache than it's worth trying to incorporate it. My initial thoughts are to just figure out a way to replace instances of reading the cbin files with rhd files or possibly creating copies of the info.rhd files and replace them with the format that can be read in by bombcell. Any thoughts/suggestions?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjusting bin size in percSpikesMissing for burst units #179

Adjusting bin size in percSpikesMissing for burst units #179

ryanlash commented Nov 8, 2024

Julie-Fabre commented Nov 9, 2024

ryanlash commented Nov 11, 2024

Julie-Fabre commented Nov 12, 2024

ryanlash commented Nov 12, 2024

Julie-Fabre commented Nov 13, 2024 •

edited

Loading

Julie-Fabre commented Nov 25, 2024

ryanlash commented Nov 25, 2024

Julie-Fabre commented Nov 28, 2024

ryanlash commented Dec 2, 2024

Julie-Fabre commented Dec 2, 2024

Julie-Fabre commented Dec 2, 2024

ryanlash commented Dec 2, 2024 •

edited

Loading

Julie-Fabre commented Dec 2, 2024

ryanlash commented Dec 2, 2024 •

edited

Loading

Julie-Fabre commented Dec 2, 2024

ryanlash commented Dec 3, 2024

Julie-Fabre commented Dec 9, 2024 •

edited

Loading

ryanlash commented Dec 9, 2024

Julie-Fabre commented Dec 9, 2024 •

edited

Loading

Julie-Fabre commented Dec 9, 2024

ryanlash commented Dec 9, 2024

Julie-Fabre commented Dec 10, 2024 •

edited

Loading

ryanlash commented Dec 10, 2024

ryanlash commented Dec 11, 2024

Julie-Fabre commented Dec 12, 2024 •

edited

Loading

ryanlash commented Dec 12, 2024

Julie-Fabre commented Dec 13, 2024 •

edited

Loading

Julie-Fabre commented Dec 16, 2024

Julie-Fabre commented Dec 22, 2024

ryanlash commented Dec 23, 2024

Adjusting bin size in percSpikesMissing for burst units #179

Adjusting bin size in percSpikesMissing for burst units #179

Comments

ryanlash commented Nov 8, 2024

Julie-Fabre commented Nov 9, 2024

ryanlash commented Nov 11, 2024

Julie-Fabre commented Nov 12, 2024

ryanlash commented Nov 12, 2024

Julie-Fabre commented Nov 13, 2024 • edited Loading

Julie-Fabre commented Nov 25, 2024

ryanlash commented Nov 25, 2024

Julie-Fabre commented Nov 28, 2024

ryanlash commented Dec 2, 2024

Julie-Fabre commented Dec 2, 2024

Julie-Fabre commented Dec 2, 2024

ryanlash commented Dec 2, 2024 • edited Loading

Julie-Fabre commented Dec 2, 2024

ryanlash commented Dec 2, 2024 • edited Loading

Julie-Fabre commented Dec 2, 2024

ryanlash commented Dec 3, 2024

Julie-Fabre commented Dec 9, 2024 • edited Loading

ryanlash commented Dec 9, 2024

Julie-Fabre commented Dec 9, 2024 • edited Loading

Julie-Fabre commented Dec 9, 2024

ryanlash commented Dec 9, 2024

Julie-Fabre commented Dec 10, 2024 • edited Loading

ryanlash commented Dec 10, 2024

ryanlash commented Dec 11, 2024

Julie-Fabre commented Dec 12, 2024 • edited Loading

ryanlash commented Dec 12, 2024

Julie-Fabre commented Dec 13, 2024 • edited Loading

Julie-Fabre commented Dec 16, 2024

Julie-Fabre commented Dec 22, 2024

ryanlash commented Dec 23, 2024

Julie-Fabre commented Nov 13, 2024 •

edited

Loading

ryanlash commented Dec 2, 2024 •

edited

Loading

ryanlash commented Dec 2, 2024 •

edited

Loading

Julie-Fabre commented Dec 9, 2024 •

edited

Loading

Julie-Fabre commented Dec 9, 2024 •

edited

Loading

Julie-Fabre commented Dec 10, 2024 •

edited

Loading

Julie-Fabre commented Dec 12, 2024 •

edited

Loading

Julie-Fabre commented Dec 13, 2024 •

edited

Loading