-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add standardized framework for wearable validation #78
Comments
Oh I love this. I'm messing with some actigraphy these days, as well as multirater agreement for something else, so I would be happy to try and work on this. I think this pipeline (especially the consensus scoring) will be very valuable since it's easy to predict that reviewers will ask for such stuff in a manuscript that uses YASA. I'm not sure about a timeframe I could promise, but you don't seem like you're in a big rush. I would probably work on this intermittently over a few months. Also I'll leave space for someone with more experience and/or time to jump in and take charge on this too. What were you thinking @raphaelvallat -- just port over those R functions into a new module? At a glance, it seems the way they have it setup is that you run each function to generate each plot. Or would you want some kind of condensed Do you have a sample dataset in mind? I suppose the main use you're considering is between the YASA staging and human raters, but since the main idea of the original paper is to compare with actigraphy which might have different epoch sizes, it seems like the pipeline should be built using that (more difficult) use-case to ensure proper handling. I'm sure there's an actigraphy vs PSG dataset out there somewhere, hopefully with more than 1 PSG scorer. |
Would love this as well! I'm not a Python user, but the @SRI-human-sleep team and I are completely available for any clarification on the R functions and any other aspect of the pipeline. So feel free to write us or set a call! We might also provide/simulate some datasets if needed, but consider keeping the focus on both binary classifications (e.g., actigraphic scores of sleep/wake) and devices providing 3+ categories (e.g., commercial trackers providing sleep staging), which are becoming increasingly popular. |
Thank you so much for your quick replies @remrama and @Luca-Menghini! And thanks for offering your help @remrama, there is no rush on this so your timeline sounds great — and I'd be happy to help along the way! @Luca-Menghini if you have some examples of wearable datasets, that would be so helpful. Ideally, we should be flexible and support actigraphy (sleep/wake), wearable (3 or 4 classes) and polysomnography (5 classes). @remrama I'm actually not sure about how the function should look like. I guess my first choice would be a single Python class with various methods, to avoid redundancy in the functions. Something like: class PerformanceComparison
def __init__(y_true, y_pred, stages=["WAKE", "NREM", "REM"])
def discrepancy_analysis()
def ebe_analysis()
def plot_confusion_matrix()
def plot_bland_altmann()
etc... where Let me know what you think! |
Yep that looks like a great setup @raphaelvallat . Even if you wanted to restructure it later (not expecting that), I think this is a straight-forward way of getting the general output built. I could just work on porting each of @Luca-Menghini 's R functions over into Python/YASA within a single class and then go from there. |
Making some progress on this (finally). @Luca-Menghini do you have any relevant datasets you're able to share? It might help to validate this pipeline if we could use the same data you used in the Sleep paper, but of course that might not be possible. If you're able to share a dataset but it needs to stay private, I guess we could email instead, just lmk. Another ideal-but-maybe-not-possible consideration: For the sake of a notebook tutorial, it'd be best if we could share at least of subset of whatever dataset we end up using. Let me know what you think, thanks. |
For the record -- we had an external meeting about this with the SRI team and made a plan moving forward. Current plan is for SRI to wrap up their own Python implementation of the Menghini paper pipeline and then we'll lead a port of the essentials over into YASA. |
I've made some progress here. It's an Note that I don't have the tutorial focused on wearable devices or actigraphy per se, but everything generalizes very easily. I added a simple function that will convert PSG-based sleep stats to wearable-based sleep stats (i.e., groups N1+N2 into "Light" sleep and renames N3 to "Deep" sleep). Add that step and then it's all the same. I think there are more features that could be added, but that this might be enough output for a first merge with YASA. The code and documentation needs to be cleaned, but I'm wondering if you think the current structure and output is good for a future pull request. If so, I'll start cleaning it up before submitting the request. If not, maybe let me know what you think should be added before a first formal merge. This notebook on my fork gives a rundown of current features. |
@remrama this is really great work! I just had a look at the notebook and it's a great direction. Loved the random hypnogram generation :D Instead of the new function to convert PSG-based sleep stats to wearables, I would edit the sleepstats function to work natively with 2, 3, 4 or 5 classes. In this function — as well as in the two new classes that you proposed — there should be a parameter to indicate whether the data is coming from a 2,3,4,5-stages scoring. This would then determine the behavior of all the underlying methods/output. For example, the tick labels would be automatically set in the plotting functions. Such a flexibility would however require a strict input format of the hypnogram. I would suggest the following accepted values:
|
That's a great idea. I'll switch to that, clean up the code (black formatting, etc.), and then add a few other small features I've been thinking about and reach back out.
Ya same I was happy with that :) It's beyond the scope of what I can do right now, but at some point I think a more advanced version of that -- like one that takes all the |
I think it could be useful to have a Python implementation of the analytical pipeline for testing sleep-tracking wearable, originally developed (in R) by @Luca-Menghini and @SRI-human-sleep: https://github.com/SRI-human-sleep/sleep-trackers-performance
More broadly, this analytical pipeline could be used to compare the performance of any sleep staging algorithm against a ground-truth reference (with 2-, 4- or 5-stages). We should also support evaluating the performance against a ground-truth consensus scoring (i.e. 2 or more experts per record).
I would love some help on this if anyone would like to contribute.
The text was updated successfully, but these errors were encountered: