The input data
constists of a set of measurements, based on accelerometer (3-axial linear acceleration) and gyroscope (3-axial angular velocity) sensors, performed on a group of 30 volunteers performing 6 different activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING). The data is partitioned into train and test, where the former represents 70% of the volunteers and the latter 30%.
Due to the acceleration signal being constitued by gravitational and body motion components, the authors separated those components into a features vector: (tBodyAcc, tGravityAcc, tBodyAccJerk, tBodyGyro, tBodyGyroJerk, tBodyAccMag, tGravityAccMag, tBodyAccJerkMag, tBodyGyroMag, tBodyGyroJerkMag) separated by the X, Y, Z components. Signals labeled with Jerk were derived in time (AccJerk would result in Linear Jerk and GyroJerk in Angular Acceleration). For the signals, different estimations were performed by the authors, mean(), std(), meanFreq(), etc (see features_info.txt for full listing).
With the exception of tGravityAcc, tBodyGyroJerk, and tGravityAccMag, the FFT was calculated on the earlier mentioned signals giving the signals in the frequency domain (fBodyAcc, fBodyAccJerk, fBodyGyro, fBodyAccMag, fBodyAccJerkMag, fBodyGyroMag, fBodyGyroJerkMag).
The data is without units as it is provided normalized by the authors.
In order to transform the partitioned and separate data into a tidy data set, the following was performed.
- Merge train and test data into one merged data set
- Train data was column binded with subject id & activity id & whole feature measurements from the train data
- Test data was similarly column binded subject id & activity id & whole feature measurements from the test data
- The merged data is a row bind of the train data & test data
- Extract (keep) only the measurements (features) with mean or standard deviation
- Select the columns containing either the keyword mean or std
- Filtered data is the subset of the merged data with the selected columns
- Obtain the activity names and place them into the data set
- Generate a vector of activity labels, which is a match of the activity id on merged data with the activity label corresponding to the activity id.
- Insert the activity label onto the filtered data set
- Appropriately label the data set with descriptive variable names (column names)
- Take the filtered column names (features) and since they are abbreviations, to make human readable, a substitution of string patterns is used:
"^t" -> "TD-" "^f" -> "FD-" "mean\\(\\)" -> "Mean" "std\\(\\)" -> "Std" "meanFreq\\(\\)" -> "MeanFreq" "AccJerk" -> "LinearJerk" "GyroJerk" -> "AngularAcceleration" "AccMag" -> "LinearAccelerationMagnitude" "GyroMag" -> "AngularVelocityMagnitude" "Acc-" -> "LinearAcceleration-" "Gyro-" -> "AngularVelocity-" "Mag-" -> "Magnitude-" "BodyBody" -> "Body" "Body" -> "Body-" "Gravity" -> "Gravity-" Where some abbreviations are replaced with their full name or their correspondence (AccJerk is Linear Jerk, GyroJerk is Angular Acceleration, etc). TD and FD represent Time-Domain and Frequency-Domain, respectively.
Description based on: https://www.elen.ucl.ac.be/Proceedings/esann/esannpdf/es2013-84.pdf
- From the data set in step 4, create a second, independent tidy data set with the average of each variable for each activity and each subject.
- The completed data set of step 4 is melted into the id variables and measure variables to have them in unique id-variable combinations
- The shape of the melted data is then described (dcast), based on subject and activity as identifiers, and aggregating afterwards with mean.
- The resulting data set is then exported to a text file and for those who desire a xls file.
The output data consists in the column names shown below. Subject_ID is the subject (1-30), Activity_Label is the description of the activity performed by the subject (LAYING, SITTING, STANDING, WALKING, WALKING_DOWNSTAIRS, WALKING_UPSTAIRS) and the rest of the columns is the aggregated measured data using mean. Due to the fact that the input measured data is normalized, the output measured data is without units.
- Subject_ID
- Activity_Label
- TD-Body-LinearAcceleration-Mean-X
- TD-Body-LinearAcceleration-Mean-Y
- TD-Body-LinearAcceleration-Mean-Z
- TD-Body-LinearAcceleration-Std-X
- TD-Body-LinearAcceleration-Std-Y
- TD-Body-LinearAcceleration-Std-Z
- TD-Gravity-LinearAcceleration-Mean-X
- TD-Gravity-LinearAcceleration-Mean-Y
- TD-Gravity-LinearAcceleration-Mean-Z
- TD-Gravity-LinearAcceleration-Std-X
- TD-Gravity-LinearAcceleration-Std-Y
- TD-Gravity-LinearAcceleration-Std-Z
- TD-Body-LinearJerk-Mean-X
- TD-Body-LinearJerk-Mean-Y
- TD-Body-LinearJerk-Mean-Z
- TD-Body-LinearJerk-Std-X
- TD-Body-LinearJerk-Std-Y
- TD-Body-LinearJerk-Std-Z
- TD-Body-AngularVelocity-Mean-X
- TD-Body-AngularVelocity-Mean-Y
- TD-Body-AngularVelocity-Mean-Z
- TD-Body-AngularVelocity-Std-X
- TD-Body-AngularVelocity-Std-Y
- TD-Body-AngularVelocity-Std-Z
- TD-Body-AngularAcceleration-Mean-X
- TD-Body-AngularAcceleration-Mean-Y
- TD-Body-AngularAcceleration-Mean-Z
- TD-Body-AngularAcceleration-Std-X
- TD-Body-AngularAcceleration-Std-Y
- TD-Body-AngularAcceleration-Std-Z
- TD-Body-LinearAccelerationMagnitude-Mean
- TD-Body-LinearAccelerationMagnitude-Std
- TD-Gravity-LinearAccelerationMagnitude-Mean
- TD-Gravity-LinearAccelerationMagnitude-Std
- TD-Body-LinearJerkMagnitude-Mean
- TD-Body-LinearJerkMagnitude-Std
- TD-Body-AngularVelocityMagnitude-Mean
- TD-Body-AngularVelocityMagnitude-Std
- TD-Body-AngularAccelerationMagnitude-Mean
- TD-Body-AngularAccelerationMagnitude-Std
- FD-Body-LinearAcceleration-Mean-X
- FD-Body-LinearAcceleration-Mean-Y
- FD-Body-LinearAcceleration-Mean-Z
- FD-Body-LinearAcceleration-Std-X
- FD-Body-LinearAcceleration-Std-Y
- FD-Body-LinearAcceleration-Std-Z
- FD-Body-LinearAcceleration-MeanFreq-X
- FD-Body-LinearAcceleration-MeanFreq-Y
- FD-Body-LinearAcceleration-MeanFreq-Z
- FD-Body-LinearJerk-Mean-X
- FD-Body-LinearJerk-Mean-Y
- FD-Body-LinearJerk-Mean-Z
- FD-Body-LinearJerk-Std-X
- FD-Body-LinearJerk-Std-Y
- FD-Body-LinearJerk-Std-Z
- FD-Body-LinearJerk-MeanFreq-X
- FD-Body-LinearJerk-MeanFreq-Y
- FD-Body-LinearJerk-MeanFreq-Z
- FD-Body-AngularVelocity-Mean-X
- FD-Body-AngularVelocity-Mean-Y
- FD-Body-AngularVelocity-Mean-Z
- FD-Body-AngularVelocity-Std-X
- FD-Body-AngularVelocity-Std-Y
- FD-Body-AngularVelocity-Std-Z
- FD-Body-AngularVelocity-MeanFreq-X
- FD-Body-AngularVelocity-MeanFreq-Y
- FD-Body-AngularVelocity-MeanFreq-Z
- FD-Body-LinearAccelerationMagnitude-Mean
- FD-Body-LinearAccelerationMagnitude-Std
- FD-Body-LinearAccelerationMagnitude-MeanFreq
- FD-Body-LinearJerkMagnitude-Mean
- FD-Body-LinearJerkMagnitude-Std
- FD-Body-LinearJerkMagnitude-MeanFreq
- FD-Body-AngularVelocityMagnitude-Mean
- FD-Body-AngularVelocityMagnitude-Std
- FD-Body-AngularVelocityMagnitude-MeanFreq
- FD-Body-AngularAccelerationMagnitude-Mean
- FD-Body-AngularAccelerationMagnitude-Std
- FD-Body-AngularAccelerationMagnitude-MeanFreq