The Iris flower data set or Fisher's Iris data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper. The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other.
A basic table is a two-dimensional grid of data, in which the rows represent individual elements of the dataset, and the columns represent quantities related to each of these elements. The columns in this table are SepalLength, SepalWidth,PetalLength,PetalWidth and Species . link
The Iris dataset was used in R.A. Fisher's classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. It includes three iris species with 50 samples each as well as some properties about each flower. One flower species is linearly separable from the other two, but the other two are not linearly separable from each other. The columns in this dataset are: SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
from sklearn import datasets
iris = datasets.load_iris()
print (iris)
Adapted from link
To investigate the output data for the given input Iris data like the column(), shape(),
min(),max(),mean(),std(),median(),head(),tail().
iris_data = pd.read_csv('C:/Users/geeth/Desktop/problemset-pands/Iris.csv')
iris_data.columns = ['sepal_length', 'sepal_width' , 'petal_length', 'petal_width', 'species']
iris_data.head(10)
iris_data.shape
iris_data.min()
iris_data.max()
iris_data.mean()
iris_data.std()
iris_data.median()
iris_data.head()
iris_data.tail()
iris_data.isnull()
First with a boxplot which is going to be in the univariate form for each measurement.
#box and whisker plots
iris_data.plot(kind='box', subplots=True, layout=(2,2), sharex=False, sharey=False)
plt.show()
Adapted from [link] https://medium.com/codebagng/basic-analysis-of-the-iris-data-set-using-python-2995618a6342
Histogram style of plotting the graph is
iris_data.hist()
plt.show()
Got the variable values for the Iris species of flowers and its sepal and petal specifications in a txt file format and saved as csv. In this dataframe we are asking pandas to read the file from my folder in my pc.
df = pd.read_csv('C:/Users/geeth/Desktop/problemset-pands/Iris.csv')
df
df.head()
This dataframe command displays the top five contents in table.
df.describe()
displays the row names -(Sepal_length,sepal_width,petal_length,petal_width,Species) and also count, mean,std,min,25%,50%,75%,max.
df.info()
This dataframe gives the information on class, range index,datacolumns, sepal and petal's width and length,species, datatypes and the memory usage.
This dataframe shows the mean absolute deviation .
pd.DataFrame(df.mad() , columns = ["Mad"] ).T
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map(plt.scatter);
Adapted from link