Data Scientist and Machine Learning Engineer with 8 years of experience in building, deploying, and optimizing predictive models and intelligent systems. Specialized in machine learning, deep learning, data engineering, and MLOps, with strong expertise in Python, Sklearn, PyTorch, and SQL.
- Development and optimization of predictive models (Regression and Classification)
- Feature engineering and large-scale data preparation
- Implementation of machine learning pipelines in production (MLOps, CI/CD)
- Database architecture and optimization (SQL, NoSQL)
- Large-scale data processing and analysis (Spark, Pandas, Dask)
- APIs for machine learning models (FastAPI, Flask, Docker)
- Integration of models into enterprise systems
- Cloud deployment (AWS, Azure)
- Collaboration with product and engineering teams for data-driven solutions
- Interests and Recent Studies:
- Application of NLP techniques using LLM Models, prompt engineering, LangChain, and LangGraph
- Computer vision and reinforcement learning with PyTorch
- I'm a student in the Postgraduate Certificate in Artificial Intelligence and Machine Learning course
- I'm a Master of Computer and Software Engineer on COPPE/UFRJ
- I'm Graduate Computer Scientist on UFRRJ
-
LLM-Powered API for Document Query and Sentiment Detection
2025 · Data Science Challenge (CGU)
Developed a robust API using FastAPI with three endpoints focused on document question answering (RAG), PDF embedding, and sentiment classification. The architecture is designed for scalability, high performance, and LLM integration.
Key goals:
- Upload and process PDFs to extract embeddings and store them in a vector database
- Enable RAG-based answers to user questions using local LLMs
- Implement sentiment classification using logprobs from open-source language models
- Deploy scalable API architecture using FastAPI, Queues, Kubernetes and Workers
Architecture Highlights:
- API Gateway with OAuth2, JWT, HTTPS encryption and rate limiting
- Asynchronous task queue with specialized GPU workers for LLM inference
- RAG with LangChain + Llama 3.1 via Ollama, using MiniLM for embeddings
- Vector database: ChromaDB
- Chunking strategy: RecursiveCharacterTextSplitter
Tech Focus: FastAPI · LangChain · Ollama · ChromaDB · LLM · Asynchronous Queues · Vector Search · Kubernetes · Sentiment Analysis · BM25 · API Deployment
-
Sales Forecasting and Discount Analysis
2025 · Data Science Challenge
Analyzed historical sales and discount patterns across 45 stores (Feb 2010 – Oct 2012). The project focused on delivering accurate forecasts and actionable insights for business strategy.
Key goals:
- Forecast department-level sales for the next year,
- Recommend high-impact business actions based on insights
- Model discount effects during holiday weeks
- Provide a sales forecast API for the next 4 weeks
Tech Focus: Regression Models · Data Analysis · Scikit-Learn · API Deployment
-
Monthly Sales Prediction Model
2025 · Data Science Project
Developed a machine learning model to predict monthly sales for new leads, using features such as average real visits, followers, and estimated sales by domain. The project involved feature engineering, model selection, and evaluation.
The best-performing model was a Decision Tree, achieving an R² score of 0.78 and capturing key sales patterns relevant to business decisions.
Role: Data Scientist
Tech Focus: Regression Models · Feature Engineering · Scikit-Learn
-
LLM - Spiritism Chat
2024 · Personal Project
Built a conversational AI to discuss and explore the Spiritist doctrine, based on the works of Allan Kardec. The system uses Python and LangChain, loading Kardec's texts to provide contextual, doctrine-based responses.
An API was also developed to integrate the model with a web interface (Streamlit), enabling real-time interactions.
Tech Focus: LLMs · LangChain · RAG · Python · API Development · Streamlit
-
A tool for analyzing patterns in hashtags on Twitter
Feb 2016 · Academic Project · Federal Rural University of Rio de Janeiro
Developed a data mining tool to identify patterns in Twitter hashtags, addressing the lack of analytical tools for large-scale social media data. The system extracts, processes, and visualizes insights using custom workflows.
In a case study with hashtags #foraDilma and #foraCunha, the tool revealed political associations and public sentiment patterns during Brazil’s 2016 political crisis.
Tech Focus: Data Mining · Text Processing · Data Analysis · KNIME · Workflow Automation
-
Domine LLMs com LangChain
This course explores Generative AI with LLMs, combining LangChain and Python to build AI applications like custom chatbots and virtual assistants. It works with models like ChatGPT, Llama, and Phi, using techniques such as RAG and embeddings. Practical projects include document interaction, video summarization, and intuitive interfaces with Streamlit.
Course link: https://www.udemy.com/course/domine-llms-com-langchain
My Repository: https://github.com/arrudamichel/course__domine_llms_com_langchain
-
Deep Learning de A a Z com PyTorch e Python
Deep Learning is a field focused on applying artificial neural networks to solve complex problems that require advanced computational techniques. This course provides both theoretical and hands-on experience with state-of-the-art Deep Learning methods using the PyTorch library in Python. It teaches to build artificial neural networks for real-world applications, including image classification, stock price prediction, and automatic image generation. The course covers key topics such as convolutional neural networks, recurrent neural networks, autoencoders, generative adversarial networks, transfer learning, and style transfer. Designed for all levels, it includes fundamental lessons for beginners and practical projects to reinforce learning.
Course link: https://www.udemy.com/course/formacao-deep-learning-pytorch-python
My Repository: https://github.com/arrudamichel/course__deep_learning_deAaZ_Pytorch_Python
-
Deep Learning Profissional com PyTorch
Master Deep Learning with PyTorch in this intensive, hands-on course designed to equip you with the skills to build, train, and deploy advanced neural networks. From fundamental tensor operations to optimizing complex models, it's to gain practical experience through real-world projects while exploring the latest AI innovations.
Course link: https://www.udemy.com/course/deep-learning-profissional-com-pytorch
My Repository:
-
Deep Learning Profissional com PyTorch
Discover the power of Reinforcement Learning and Deep Learning with this intermediate-level course on building a virtual self-driving car using PyTorch and Python. Learn the fundamentals of artificial neural networks, explore the concepts of reinforcement learning with Deep Q-Learning and train an autonomous vehicle using modern deep learning techniques. This course combines theoretical foundations with hands-on projects, providing you with the necessary tools to model and implement complex AI-driven solutions.
Course link: https://www.udemy.com/course/aprendizagem-reforco-deep-learning-pytorch-python
My Repository:
