COVID-19-Cough-Classification-phase-1

Check out my 2 YOUTUBE channels for more:

Mrzaizai2k - AI (NEW)
Mrzaizai2k (old)

This project was created to distinguish between covid and non-covid patients based on cough sound. This is a non-invasive, low-cost, and simple procedure, we may use it as a first filter before testing with PCR or other methods. In phase 1, I attempted to understand the challenges with this method, the dataset, how to deal with audio files, and for the first time, integrate SMOTE and K-fold cross validation.

I'm sorry, but I don't have a '.py' file for you this time. But, as always, you may test on my Kaggle notebook by clicking HERE.

To make amends, I'll go over what I did on this project in further detail:

1. Dataset

The dataset I used here is from AICovidVN115m contest. Which includes 669 positive cases (16.45% of total) and 3399 negative cases. But I didn't extract the features from .wav file because it's not really necessary now. I tool the raw feature which had been extracted from this repo

2. Primary features

The feature extracted in this project were MFCCs, Mel frequency spectrogram, chroma and those are mean value np.mean.

3. Scaling data

"Just to give you an example — if you have multiple independent variables like age, salary, and height; With their range as (18–100 Years), (25,000–75,000 Euros), and (1–2 Meters) respectively, feature scaling would help them all to be in the same range, for example- centered around 0 or in the range (0,1) depending on the scaling technique."

Reference: https://www.atoti.io/when-to-perform-a-feature-scaling/

4. Imbalanced data

As you can see, our project is imbalanced positive: 669 (16.45% of total), negative cases: 3399 There are a alot of proposed methods to solve this like:

Over sampling
Undersampling
Hybrid over and under sampling
Gain more data
Data augmentation (I will use it for my next Covid classification phase 2)
- Time stretch
- Pitch shift
- GAIN
- Background noise and so on...

Reference: https://phamdinhkhanh.github.io/2020/02/17/ImbalancedData.html#45-thu-th%E1%BA%ADp-th%C3%AAm-quan-s%C3%A1t

In this project. I'll try resolving the imbalanced data by oversampling with SMOTE. It's a oversampling method. For me the result is not really good because they change the features and we don't know how they change its and if the new features were true in real life. I guess in phase 2 I will try on Gain and background noise to oversample the dataset

However, SMOTE help a lot on the training time with early stopping

Reference: https://machinelearningmastery.com/smote-oversampling-for-imbalanced-classification/

5. Model

I use a simple ANN model with drop out layers (0.5) to avoid overfitting. The model here is quite simple. I prefer using CRNN + attention and a more complicated model for 2D dataset instead of 1D dataset like this. You know, it's just PHASE 1!

6. K fold cross validation

Here I use K-fold (Stratified k fold) with the oversampled data

After all, I think K-fold is just a method to generally assessed how good or bad the model is. Help us tune the hyperparameters better

Reference:

https://viblo.asia/p/lam-chu-stacking-ensemble-learning-Az45b0A6ZxY

https://www.machinecurve.com/index.php/2020/02/18/how-to-use-k-fold-cross-validation-with-keras/

https://github.com/SadmanSakib93/Stratified-k-fold-cross-validation-Image-classification-keras/blob/master/stratified_K_fold_CV.ipynb

https://miai.vn/2021/01/18/k-fold-cross-validation-tuyet-chieu-train-khi-it-du-lieu/

7. Result

SORRY FOR THE PICTURE RESOLUTION You can visit my Kaggle Notebook to see it clear

Original data

	precision	recall	f1-score	support
Negative	0.94	0.94	0.94	680
Positive	0.70	0.68	0.69	134
accuracy			0.90	814

Figure 1. Result with original data

Figure 2. AUC = 91% with original data

Oversampling data with SMOTE

	precision	recall	f1-score	support
Negative	0.94	0.97	0.96	680
Positive	0.83	0.70	0.76	134
accuracy			0.93	814

Figure 3. Result with SMOTE

Figure 4. AUC = 93% with SMOTE

Oversampling data and k-fold cross validation

	precision	recall	f1-score	support
Negative	0.97	0.96	0.96	680
Positive	0.81	0.84	0.82	134
accuracy			0.94	814

Figure 5. Result with SMOTE and K_fold cross validation

Figure 6. AUC = 95% with SMOTE and K_fold cross validation

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
result		result
README.md		README.md
covid-19-cough-classification.ipynb		covid-19-cough-classification.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19-Cough-Classification-phase-1

Table of contents

1. Dataset

2. Primary features

3. Scaling data

4. Imbalanced data

5. Model

6. K fold cross validation

7. Result

About

Releases

Packages

Languages

mrzaizai2k/COVID-19-Cough-Classification-phase-1-

Folders and files

Latest commit

History

Repository files navigation

COVID-19-Cough-Classification-phase-1

Table of contents

1. Dataset

2. Primary features

3. Scaling data

4. Imbalanced data

5. Model

6. K fold cross validation

7. Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages