Skip to content

Yegor-Budnikov/getting-cleaning-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Welcome to my repo getting-cleaning-data

This is public repo for the project of the course Getting and Cleaning Data on Coursera

Now I would like to tell you how the things in this repo operate.

run_analysis.R

This script contains several steps.

Import

In this script library dplyr is used.

Getting some names

In this part I read the list of names of features into the variable feature_names, read the names of activities into the variable activity_names and read the list of subjects from the training and test set into the variables subject_train and subject_test respectively.

Getting the training set

In this part I read the training set into the variable X_train, set the features names as column names of the training set, read the labels of the training set into the variable y_train, add these labels to the training set, add subjects to the training set and define each label with a proper activity name.

Getting the test set

In this part I read the test set into the variable X_test, set the features names as column names of the test set, read the labels of the test set into the variable y_test, add these labels to the test set, add subjects to the test set and define each label with a proper activity name.

Merging sets

In this part I merge the training and the test sets into the variable data_set.

Looking for the measurements on mean

In this part I get indeces of measurements that cantain "mean()" in the name and form the vector new_names_mean that contains new names for the mean measurements.

Looking for the measurements on standard deviation

In this part I get indeces of measurements that cantain "std()" in the name and form the vector new_names_std that contains new names for the std measurements.

Cleaning the data

In this part I replace old names of the proper columns with the new ones, catch indeces of columns with subject id and activity name into the variables activity_idx and subject_idx and leave in the variable data_set leave only those columns that were collected with the mean() and std().

Getting the tidy data

In this part I get summary over the columns of data_set with the function aggregate, put this tidy data into the variable tidy_data and modify the column names in the tidy data a bit.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages