Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inventory data used and select biomedical datasets #9

Open
k8hertweck opened this issue Jul 23, 2020 · 1 comment
Open

Inventory data used and select biomedical datasets #9

k8hertweck opened this issue Jul 23, 2020 · 1 comment
Assignees

Comments

@k8hertweck
Copy link
Contributor

k8hertweck commented Jul 23, 2020

Data inventory from @lakikowolfe 👍 Note data and characteristics of the data

  • How the data was used
  • dtypes
  • missing data?
  • Include dummy datasets TA made

Class 1

  • Commute Time Dataset
    • Feature engineering and EDA
    • No missing data
    • Generic dataset with both categorical and numeric data

Class 2

  • Commute Time Dataset
    • Viz of single variables and relationships, linear regression, mean squared error, random forests

Class 3

  • Dummy dataset of 0 and 1 as an example of categorical data
  • Dummy dataset of two random clouds of points to illustrate decision boundaries
  • Tennis dataset
    • all categorical variables, target variable is yes/no played tennis
  • Iris dataset
    • All numeric variables except for target variable (categorical: species)
  • Dummy dataset for random forest

Class 4

  • Dummy data to show the curse of dimensionality
  • Iris dataset to show the benefits of PCA
    • Pair plot
    • PCA
  • Dummy data to superimpose the first component line over a series of random points
  • Dummy data and custom code to illustrate eiganvectors
  • Centered faces dataset: "Eigenfaces"
  • Dummy dataset of clusters to show K means
  • Arrests data
    • four numeric vars
  • NCI60 for PCA and hierarchical clustering
@k8hertweck k8hertweck self-assigned this Jul 23, 2020
@k8hertweck
Copy link
Contributor Author

Streamlining approach relative to R ML course.

Class 1: glaucoma data

  • predicting age against other demographic data (linear regression)

Class 2: glaucoma data

  • classification
  • train/test
  • imbalanced data
  • pre-processing pipelines
  • build workflows

Class 3: genomic data?

  • regression
  • imbalance
  • workflow

Class 4: genomic data?

  • unsupervised
  • clustering
  • PCA/UMAP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant