This package can be installed using pip install .
- datamanip
- CentralValues: Returns a dictionary containing central values Mean, Median, Range, Variance, Standard Deviation and Quantile
- DataOps: Functions to work with pandas dataframes-- (1)dataDep_csv(infile) returns a deduplicated dataframe; (2) dataFrameSplit(dataframe, no of records) splits a dataframe based on no of records needed
- datasetSeparator: Useful for looking into pandas dataframe and do column manipulation like removal of columns, current functions -- (1)displayCols(dataframe) to display columns,(2) remCols(dataframe) to remove columns, (3) sep_data_target(dataframe) to separate data and target
- externals
- LoadDataset: (1)load_pickle(filestr) Loads a pickled object, (2) data_target_separator(numpy array) data, target separator for numpy dataset. Assumes the last column contains the labels.
- mlops
- learnfromsample: An important module which takes training set and test set as inputs with parameters such as sample size, sample methods and classifier. Scaler is optional. Returns test set true labels and predicted labels, training set true labels and predicted labels and Fitting time and Prediction time of the model under examination.
- Visualization: Dimensionality reduction in order to visualize dataset in 2D and 3D spaces.
- sampling
This contains different probability based sampling modules - works with numpy datasets
- ClusterSampling
- RandomSampling
- StratifiedSampling
- SystematicSampling
- misc
This contains very project specific modules which work only with this project scenario