Skip to content

ktimsclustering/Tims_Portfolio

Repository files navigation

Natural Language Processing Project:

In this NLP project, I attempted to classify Yelp reviews into 1-star or 5-star categories based on the text content in the reviews. I used the Yelp Review Dataset from Kaggle, where each observation in the dataset is a review of a particular business by a particular user. I utilized pipeline methods for more complex tasks. The "stars" column represents the number of stars (1 through 5) assigned by the reviewer to the business, while the "cool," "useful," and "funny" columns are ratings of the review itself, not a rating of the business.

Decision Trees and Random Forest project:

In this completed LendingClub.com Data Analysis project, publicly available data from LendingClub.com was explored to predict whether or not a borrower would pay back their loan in full. LendingClub connects borrowers with investors, and the goal was to create a model that would help investors identify borrowers with a high probability of paying them back. The lending data used in the project was from 2007-2010 and contained various features related to the borrower's credit history and financial situation.The data was classified based on whether the borrower paid back their loan in full or not using both decision trees and random forest classifier.

K Nearest Neighbor Project:

In this project, a machine learning model has been built using the KNN algorithm to predict a class for a new data point based on the features of a given classified data set. The project was provided with the data and target classes but with hidden feature column names. The KNN algorithm was used to build a model that could predict the class of a new data point by finding the K closest data points in the feature space and taking a majority vote among their classes

Ecommerce- Linear Regression:

This Linear Regression Project involves analyzing customer data for an Ecommerce company based in New York City that sells clothing online and offers in-store style and clothing advice sessions. The objective of the project is to determine whether the company should focus their efforts on their mobile app experience or their website

Finance Project:

This data project involves an exploratory analysis of stock prices, with a focus on bank stocks and their performance during the financial crisis of 2007-2008, up until early 2016. The project is designed to provide practice in using visualization and pandas skills, and is not intended to be a comprehensive financial analysis or provide financial advice. By analyzing and visualizing the data, we aim to gain insights into the trends and patterns that emerged during this period, as well as the impact of the financial crisis on bank stocks. This project offers an opportunity to develop and refine data analysis skills, while also gaining a deeper understanding of the dynamics of the financial market.

911 Calls Project:

The 911 Calls Capstone Project is an exploratory data analysis (EDA) project that uses a dataset containing information about emergency calls made to the 911 service. The project involves analyzing the data to uncover insights and trends related to the calls made, such as the time of day, day of the week, and type of emergency. Seaborn, a powerful plotting library, is employed to create visualizations that help to identify patterns and relationships within the data.

Multiclass classification of COVID-19 X-ray dataset using CNN:

This code initializes a Convolutional Neural Network (CNN) model in Python using the Keras API with TensorFlow backend. The model consists of several layers, including Conv2D, MaxPooling2D, and Dense layers. The first layer is a convolutional layer with 32 filters, 3x3 kernel size, and ReLU activation function. Then, a max pooling layer with 2x2 pool size is added. Afterward, two additional convolutional and max pooling layers are added with 64 and 128 filters, respectively. The output of the convolutional layers is flattened, followed by a fully connected layer with 128 units and ReLU activation function. Finally, the output layer is added with softmax activation function for multiclass classification. The model is then compiled with categorical cross-entropy loss function and Adam optimizer.

About

Data science portfolio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published