Skip to content
View arrudamichel's full-sized avatar

Block or report arrudamichel

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
arrudamichel/README.md

Michel Arruda - Data Scientist | Machine Learning Engineer

Data Scientist and Machine Learning Engineer with 8 years of experience in building, deploying, and optimizing predictive models and intelligent systems. Specialized in machine learning, deep learning, data engineering, and MLOps, with strong expertise in Python, Sklearn, PyTorch, and SQL.

Key Skills:

  • Development and optimization of predictive models (Regression and Classification)
  • Feature engineering and large-scale data preparation
  • Implementation of machine learning pipelines in production (MLOps, CI/CD)
  • Database architecture and optimization (SQL, NoSQL)
  • Large-scale data processing and analysis (Spark, Pandas, Dask)
  • APIs for machine learning models (FastAPI, Flask, Docker)
  • Integration of models into enterprise systems
  • Cloud deployment (AWS, Azure)
  • Collaboration with product and engineering teams for data-driven solutions
  • Interests and Recent Studies:
  • Application of NLP techniques using LLM Models, prompt engineering, LangChain, and LangGraph
  • Computer vision and reinforcement learning with PyTorch

Education:

  • I'm a student in the Postgraduate Certificate in Artificial Intelligence and Machine Learning course
  • I'm a Master of Computer and Software Engineer on COPPE/UFRJ
  • I'm Graduate Computer Scientist on UFRRJ

Projects:

  • LLM-Powered API for Document Query and Sentiment Detection

    2025 · Data Science Challenge (CGU)

    Developed a robust API using FastAPI with three endpoints focused on document question answering (RAG), PDF embedding, and sentiment classification. The architecture is designed for scalability, high performance, and LLM integration.

    Key goals:

    • Upload and process PDFs to extract embeddings and store them in a vector database
    • Enable RAG-based answers to user questions using local LLMs
    • Implement sentiment classification using logprobs from open-source language models
    • Deploy scalable API architecture using FastAPI, Queues, Kubernetes and Workers

    Architecture Highlights:

    • API Gateway with OAuth2, JWT, HTTPS encryption and rate limiting
    • Asynchronous task queue with specialized GPU workers for LLM inference
    • RAG with LangChain + Llama 3.1 via Ollama, using MiniLM for embeddings
    • Vector database: ChromaDB
    • Chunking strategy: RecursiveCharacterTextSplitter

    Tech Focus: FastAPI · LangChain · Ollama · ChromaDB · LLM · Asynchronous Queues · Vector Search · Kubernetes · Sentiment Analysis · BM25 · API Deployment

    🔗 GitHub Repository

  • Sales Forecasting and Discount Analysis

    2025 · Data Science Challenge

    Analyzed historical sales and discount patterns across 45 stores (Feb 2010 – Oct 2012). The project focused on delivering accurate forecasts and actionable insights for business strategy.

    Key goals:

    • Forecast department-level sales for the next year,
    • Recommend high-impact business actions based on insights
    • Model discount effects during holiday weeks
    • Provide a sales forecast API for the next 4 weeks

    Tech Focus: Regression Models · Data Analysis · Scikit-Learn · API Deployment

    🔗 GitHub Repository

    Presentation

  • Monthly Sales Prediction Model

    2025 · Data Science Project

    Developed a machine learning model to predict monthly sales for new leads, using features such as average real visits, followers, and estimated sales by domain. The project involved feature engineering, model selection, and evaluation.

    The best-performing model was a Decision Tree, achieving an R² score of 0.78 and capturing key sales patterns relevant to business decisions.

    Role: Data Scientist

    Tech Focus: Regression Models · Feature Engineering · Scikit-Learn

    🔗 GitHub Repository

    Presentation

  • LLM - Spiritism Chat

    2024 · Personal Project

    Built a conversational AI to discuss and explore the Spiritist doctrine, based on the works of Allan Kardec. The system uses Python and LangChain, loading Kardec's texts to provide contextual, doctrine-based responses.

    An API was also developed to integrate the model with a web interface (Streamlit), enabling real-time interactions.

    Tech Focus: LLMs · LangChain · RAG · Python · API Development · Streamlit

    🔗 GitHub Repository

  • A tool for analyzing patterns in hashtags on Twitter

    Feb 2016 · Academic Project · Federal Rural University of Rio de Janeiro

    Developed a data mining tool to identify patterns in Twitter hashtags, addressing the lack of analytical tools for large-scale social media data. The system extracts, processes, and visualizes insights using custom workflows.

    In a case study with hashtags #foraDilma and #foraCunha, the tool revealed political associations and public sentiment patterns during Brazil’s 2016 political crisis.

    Tech Focus: Data Mining · Text Processing · Data Analysis · KNIME · Workflow Automation

    🔗 GitHub Repository


Courses

  • Domine LLMs com LangChain

    This course explores Generative AI with LLMs, combining LangChain and Python to build AI applications like custom chatbots and virtual assistants. It works with models like ChatGPT, Llama, and Phi, using techniques such as RAG and embeddings. Practical projects include document interaction, video summarization, and intuitive interfaces with Streamlit.

    Course link: https://www.udemy.com/course/domine-llms-com-langchain

    My Repository: https://github.com/arrudamichel/course__domine_llms_com_langchain

  • Deep Learning de A a Z com PyTorch e Python

    Deep Learning is a field focused on applying artificial neural networks to solve complex problems that require advanced computational techniques. This course provides both theoretical and hands-on experience with state-of-the-art Deep Learning methods using the PyTorch library in Python. It teaches to build artificial neural networks for real-world applications, including image classification, stock price prediction, and automatic image generation. The course covers key topics such as convolutional neural networks, recurrent neural networks, autoencoders, generative adversarial networks, transfer learning, and style transfer. Designed for all levels, it includes fundamental lessons for beginners and practical projects to reinforce learning.

    Course link: https://www.udemy.com/course/formacao-deep-learning-pytorch-python

    My Repository: https://github.com/arrudamichel/course__deep_learning_deAaZ_Pytorch_Python

  • Deep Learning Profissional com PyTorch

    Master Deep Learning with PyTorch in this intensive, hands-on course designed to equip you with the skills to build, train, and deploy advanced neural networks. From fundamental tensor operations to optimizing complex models, it's to gain practical experience through real-world projects while exploring the latest AI innovations.

    Course link: https://www.udemy.com/course/deep-learning-profissional-com-pytorch

    My Repository:

  • Deep Learning Profissional com PyTorch

    Discover the power of Reinforcement Learning and Deep Learning with this intermediate-level course on building a virtual self-driving car using PyTorch and Python. Learn the fundamentals of artificial neural networks, explore the concepts of reinforcement learning with Deep Q-Learning and train an autonomous vehicle using modern deep learning techniques. This course combines theoretical foundations with hands-on projects, providing you with the necessary tools to model and implement complex AI-driven solutions.

    Course link: https://www.udemy.com/course/aprendizagem-reforco-deep-learning-pytorch-python

    My Repository:

Connect with me:

arruda_michel | Instagram



Pinned Loading

  1. ds_challenge_ml ds_challenge_ml Public

    This repository is a resolution for a data science challenge for ML

    Jupyter Notebook

  2. project__graduate_tcc_twitter_hashtag project__graduate_tcc_twitter_hashtag Public

    A tool for analyzing patterns in hashtags on Twitter using Knime.

  3. project__spiritism_chat project__spiritism_chat Public

    The repository for The Spiritism Chat, which is a chat to answer about spiritism, that Allan Kardec codified.

    Jupyter Notebook

  4. project__lead_monthly_sales_predict project__lead_monthly_sales_predict Public

    Jupyter Notebook

  5. hugdiniz/recMusic hugdiniz/recMusic Public

    JavaScript 1