This is public repo for the project of the course Getting and Cleaning Data on Coursera
Now I would like to tell you how the things in this repo operate.
This script contains several steps.
In this script library dplyr is used.
In this part I read the list of names of features into the variable feature_names, read the names of activities into the variable activity_names and read the list of subjects from the training and test set into the variables subject_train and subject_test respectively.
In this part I read the training set into the variable X_train, set the features names as column names of the training set, read the labels of the training set into the variable y_train, add these labels to the training set, add subjects to the training set and define each label with a proper activity name.
In this part I read the test set into the variable X_test, set the features names as column names of the test set, read the labels of the test set into the variable y_test, add these labels to the test set, add subjects to the test set and define each label with a proper activity name.
In this part I merge the training and the test sets into the variable data_set.
In this part I get indeces of measurements that cantain "mean()" in the name and form the vector new_names_mean that contains new names for the mean measurements.
In this part I get indeces of measurements that cantain "std()" in the name and form the vector new_names_std that contains new names for the std measurements.
In this part I replace old names of the proper columns with the new ones, catch indeces of columns with subject id and activity name into the variables activity_idx and subject_idx and leave in the variable data_set leave only those columns that were collected with the mean() and std().
In this part I get summary over the columns of data_set with the function aggregate, put this tidy data into the variable tidy_data and modify the column names in the tidy data a bit.