Skip to content

List of applied ML papers and blog posts merged from {Eugene Yan, Evidently AI, and personal search} with simple progress tracking and paper recommendation

Notifications You must be signed in to change notification settings

paul-cayet/applied-ml-with-recommendation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

applied-ml-extended

A small list of 811 applied ml papers/blog posts, collected from:

+ Extremely simple progress update with random recommendation, for personal use.

❌ need to read the paper/blog post
✅ have read the paper/blog post

Replace ❌ when ✅ when needed, and run

python update_readme.py -f README.md

To update the progress and paper recommendation.

Note:
Exact Duplicates were removed but there may be some remaining duplicates.

Recommended Paper
How to Optimise Rankings with Cascade Bandits Expedia 2022

Progress
Data Quality: 0% (0/8), recommendation: An Approach to Data Quality for Netflix Personalization Systems Netflix 2020
Data Engineering: 7% (1/14), recommendation: How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand DoorDash 2020
Data Discovery: 0% (0/17), recommendation: Amundsen: One Year Later Lyft 2020
Feature Stores: 5% (1/21), recommendation: Developing scalable feature engineering DAGs Metaflow + Hamilton via Outerbounds 2022
Classification: 0% (0/30), recommendation: Teaching Machines to Triage Firefox Bugs Mozilla 2019
Regression: 14% (3/21), recommendation: Predicting Food Delivery Time at Cart Swiggy 2023
Forecasting: 3% (1/33), recommendation: Instacart’s Item Availability Architecture: Solving for scale and consistency Instacart 2023
Recommendation: 0% (0/160), recommendation: Nightingale: Scalable Daily Sales Email Sending Decision Model Wayfair 2022
Search & Ranking: 1% (1/99), recommendation: AI at Scale in Bing Microsoft 2020
Embeddings: 0% (0/19), recommendation: Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings (Paper) Apple 2022
Natural Language Processing: 0% (0/58), recommendation: Toward a full-scale neural machine translation in production: the Booking.com use case
Sequence Modelling: 0% (0/11), recommendation: Using deep learning to detect abusive sequences of member activity (Video) LinkedIn 2021
Computer Vision: 0% (0/42), recommendation: A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper) Google 2020
Reinforcement Learning: 0% (0/12), recommendation: Building AI Trading Systems Denny Britz 2020
Anomaly Detection: 5% (1/20), recommendation: Fighting fraud with Triplet Loss OLX Group 2020
Graph: 0% (0/16), recommendation: Traffic Prediction with Advanced Graph Neural Networks DeepMind 2020
Optimization: 0% (0/53), recommendation: Automated detection, diagnosis & remediation of app failure Capital One 2021
Information Extraction: 0% (0/14), recommendation: One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper) Alibaba 2020
Weak Supervision: 0% (0/4), recommendation: Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper) Google 2019
Generation: 0% (0/20), recommendation: From idea to reality: Elevating our customer support through generative AI Vimeo 2023
Audio: 0% (0/4), recommendation: Voice Reorder Experience: add Multiple Product Items to your shopping cart Walmart 2022
Privacy-Preserving Machine Learning: 0% (0/7), recommendation: LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers
Validation and A/B Testing: 0% (0/52), recommendation: The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000% DoorDash 2021
Model Management: 0% (0/7), recommendation: Managing ML Models @ Scale - Intuit’s ML Platform Intuit 2020
Efficiency: 0% (0/3), recommendation: GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper) Facebook 2020
Ethics: 0% (0/7), recommendation: Introducing Twitter’s first algorithmic bias bounty challenge Twitter 2021
Infra: 0% (0/2), recommendation: Reengineering Facebook AI’s Deep Learning Platforms for Interoperability Facebook 2020
MLOps Platforms: 0% (0/27), recommendation: Evolution of ML Fact Store Netflix 2022
Practices: 25% (5/20), recommendation: Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper) Google 2014
Team Structure: 11% (1/9), recommendation: A Behind-the-Scenes Look at How Postman’s Data Team Works Postman 2021
Fails: 71% (5/7), recommendation: A British AI Tool to Predict Violent Crime Is Too Flawed to Use United Kingdom 2020

Table of Contents

  1. Data Quality
  2. Data Engineering
  3. Data Discovery
  4. Feature Stores
  5. Classification
  6. Regression
  7. Forecasting
  8. Recommendation
  9. Search & Ranking
  10. Embeddings
  11. Natural Language Processing
  12. Sequence Modelling
  13. Computer Vision
  14. Reinforcement Learning
  15. Anomaly Detection
  16. Graph
  17. Optimization
  18. Information Extraction
  19. Weak Supervision
  20. Generation
  21. Audio
  22. Privacy-Preserving Machine Learning
  23. Validation and A/B Testing
  24. Model Management
  25. Efficiency
  26. Ethics
  27. Infra
  28. MLOps Platforms
  29. Practices
  30. Team Structure
  31. Fails

Data Quality

Reliable and Scalable Data Ingestion at Airbnb Airbnb 2016
Monitoring Data Quality at Scale with Statistical Modeling Uber 2017
Data Management Challenges in Production Machine Learning (Paper) Google 2017
Automating Large-Scale Data Quality Verification (Paper)Amazon 2018
Meet Hodor — Gojek’s Upstream Data Quality Tool Gojek 2019
Data Validation for Machine Learning (Paper) Google 2019
An Approach to Data Quality for Netflix Personalization Systems Netflix 2020
Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper) Facebook 2020

Data Engineering

Zipline: Airbnb’s Machine Learning Data Management Platform Airbnb 2018
Sputnik: Airbnb’s Apache Spark Framework for Data Engineering Airbnb 2020
Unbundling Data Science Workflows with Metaflow and AWS Step Functions Netflix 2020
How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand DoorDash 2020
Revolutionizing Money Movements at Scale with Strong Data Consistency Uber 2020
Zipline - A Declarative Feature Engineering Framework Airbnb 2020
Automating Data Protection at Scale, Part 1 (Part 2) Airbnb 2021
Real-time Data Infrastructure at Uber Uber 2021
Introducing Fabricator: A Declarative Feature Engineering Framework DoorDash 2022
Functions & DAGs: introducing Hamilton, a microframework for dataframe generation Stitch Fix 2021
Optimizing Pinterest’s Data Ingestion Stack: Findings and Learnings Pinterest 2022
Lessons Learned From Running Apache Airflow at Scale Shopify 2022
Data Mesh — A Data Movement and Processing Platform @ Netflix Netflix 2022
Building Scalable Real Time Event Processing with Kafka and Flink DoorDash 2022

Data Discovery

Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code) Apache
Collect, Aggregate, and Visualize a Data Ecosystem's Metadata (Code) WeWork
Discovery and Consumption of Analytics Data at Twitter Twitter 2016
Democratizing Data at Airbnb Airbnb 2017
Databook: Turning Big Data into Knowledge with Metadata at Uber Uber 2018
Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code) Netflix 2018
Amundsen — Lyft’s Data Discovery & Metadata Engine Lyft 2019
Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code) Lyft 2019
DataHub: A Generalized Metadata Search & Discovery Tool (Code) LinkedIn 2019
Amundsen: One Year Later Lyft 2020
Using Amundsen to Support User Privacy via Metadata Collection at Square Square 2020
Turning Metadata Into Insights with Databook Uber 2020
DataHub: Popular Metadata Architectures Explained LinkedIn 2020
How We Improved Data Discovery for Data Scientists at Spotify Spotify 2020
How We’re Solving Data Discovery Challenges at Shopify Shopify 2020
Nemo: Data discovery at Facebook Facebook 2020
Exploring Data @ Netflix (Code) Netflix 2021

Feature Stores

Distributed Time Travel for Feature Generation Netflix 2016
Building the Activity Graph, Part 2 (Feature Storage Section) LinkedIn 2017
Fact Store at Scale for Netflix Recommendations Netflix 2018
Feature Store: The missing data layer for Machine Learning pipelines? Hopsworks 2018
Introducing Feast: An Open Source Feature Store for Machine Learning (Code) Gojek 2019
Michelangelo Palette: A Feature Engineering Platform at Uber Uber 2019
The Architecture That Powers Twitter's Feature Store Twitter 2019
Accelerating Machine Learning with the Feature Store Service Condé Nast 2019
Feast: Bridging ML Models and Data Gojek 2020
Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression DoorDash 2020
Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s Feed LinkedIn 2020
Building a Feature Store Monzo Bank 2020
Butterfree: A Spark-based Framework for Feature Store Building (Code) QuintoAndar 2020
Building Riviera: A Declarative Real-Time Feature Engineering Framework DoorDash 2021
Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory Uber 2021
ML Feature Serving Infrastructure at Lyft Lyft 2021
Near real-time features for near real-time personalization LinkedIn 2022
Building the Model Behind DoorDash’s Expansive Merchant Selection DoorDash 2022
Open sourcing Feathr – LinkedIn’s feature store for productive machine learning LinkedIn 2022
Developing scalable feature engineering DAGs Metaflow + Hamilton via Outerbounds 2022
Feature Store Design at Constructor Constructor.io 2023

Classification

Prediction of Advertiser Churn for Google AdWords (Paper) Google 2010
High-Precision Phrase-Based Document Classification on a Modern Scale (Paper) LinkedIn 2011
Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper) Walmart 2014
Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper) NAVER 2016
Learning to Diagnose with LSTM Recurrent Neural Networks (Paper) Google 2017
Discovering and Classifying In-app Message Intent at Airbnb Airbnb 2019
Teaching Machines to Triage Firefox Bugs Mozilla 2019
How We Built the Good First Issues Feature GitHub 2020
Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper) Microsoft 2020
Scalable Data Classification for Security and Privacy (Paper) Facebook 2020
Uncovering Online Delivery Menu Best Practices with Machine Learning DoorDash 2020
Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging DoorDash 2020
Deep Learning: Product Categorization and Shelving Walmart 2021
Large-scale Item Categorization for e-Commerce (Paper) DianPing, eBay 2012
Semantic Label Representation with an Application on Multimodal Product Categorization Walmart 2022
Building Airbnb Categories with ML and Human-in-the-Loop Airbnb 2022
Scaling Up LLM Reviews for Google Ads Content Moderation
Learning to Learn to Predict Performance Regressions in Production at Meta
Categorizing User Sessions at Pinterest
PBODL : Parallel Bayesian Online Deep Learning for Click-Through Rate Prediction in Tencent Advertising System
Large Scale Long-tailed Product Recognition System at Alibaba
Large-Scale Training System for 100-Million Classification at Alibaba
Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products
Presenting Precog, Nubank’s Real Time Event AI Nubank 2023
Classifying restaurant cuisines with subjective labels Foodpanda 2022
How We Built a (Mostly) Automated System to Solve Credit Card Merchant Classification Brex 2021
From RGB to Descriptive Color Names: Wayfair's in-house color algorithms to improve customer shopping experience. Wayfair 2021
Email Classification Slack 2021
Categorizing user-uploaded documents Scribd 2021
Categorizing Products at Scale Shopify 2020

Regression

Using Machine Learning to Predict Value of Homes On Airbnb Airbnb 2017
Using Machine Learning to Predict the Value of Ad Requests Twitter 2020
Open-Sourcing Riskquant, a Library for Quantifying Risk (Code) Netflix 2020
Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment DoorDash 2020
A ML based approach to proactive advertiser churn prevention Pinterest 2023
Lifecycle of a Successful ML Product: Reducing Dasher Wait Times DoorDash 2023
Delivery-Date Prediction Wayfair 2023
How DoorDash Upgraded a Heuristic with ML to Save Thousands of Canceled Orders DoorDash 2023
Expedia Group’s Customer Lifetime Value Prediction Model Expedia 2023
Where is my order? — Part I Swiggy 2023
Predicting Food Delivery Time at Cart Swiggy 2023
How ML Powers — When is my order coming? — Part II Swiggy 2023
Machine Learning for Delivery Time Estimation OLX 2023
Using Data Science to Retain Customers Gousto 2022
Sales Pipeline Management with Machine Learning: A Lightweight Two-Layer Ensemble Classifier Framework PayPal 2022
How We Estimate Food Debarkation Time With 'Tensoba' medium Gojek 2022
The journey to build an explainable AI-driven recommendation system Linkedin 2022
Identifying High-Intent Buyers Zillow 2022
Cross-Selling Optimization Using Deep Learning PayPal 2021
Optimal drop times using machine learning Picnic 2020
Using Machine Learning to Determine Contact Accuracy Scores Zoominfo 2019

Forecasting

Engineering Extreme Event Forecasting at Uber with RNN Uber 2017
Forecasting at Uber: An Introduction Uber 2018
Transforming Financial Forecasting with Data Science and Machine Learning at Uber Uber 2018
Under the Hood of Gojek’s Automated Forecasting Tool Gojek 2019
BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video) Google 2020
Retraining Machine Learning Models in the Wake of COVID-19 DoorDash 2020
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code) Atlassian 2020
Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code) Uber 2021
Greykite: A flexible, intuitive, and fast forecasting library LinkedIn 2021
The history of Amazon’s forecasting algorithm Amazon 2021
DeepETA: How Uber Predicts Arrival Times Using Deep Learning Uber 2022
Forecasting Grubhub Order Volume At Scale Grubhub 2022
Characterizing and Forecasting User Engagement with In-app Action Graph: A Case Study of Snapchat
From Known to Unknown: Knowledge-guided Transformer for Time-Series Sales Forecasting in Alibaba
Greykite: Deploying Flexible Forecasting at Scale at LinkedIn
Demand and ETR Forecasting at Airports Uber 2023
Deep Learning based Forecasting: a case study from the online fashion industry Zalando 2023
How Wayfair uses “Predicted Winners” Models to Accelerate Success for New Products Wayfair 2023
How Instacart Modernized the Prediction of Real Time Availability for Hundreds of Millions of Items While Saving Costs Instacart 2023
How DoorDash Built an Ensemble Learning Model for Time Series Forecasting Doordash 2023
How DoorDash Improves Holiday Predictions via Cascade ML Approach Doordash 2023
How Instacart’s Item Availability Evolved Over the Pandemic Instacart 2023
Instacart’s Item Availability Architecture: Solving for scale and consistency Instacart 2023
Forecasting something that never happened: how we estimated past promotions profitability Artefact 2022
Causal Forecasting at Lyft (Part 1) Lyft 2022
Causal Forecasting at Lyft (Part 2) Lyft 2022
“I See Tacos In Your Future”: Order Volume Forecasting at Grubhub Grubhub 2021
Managing Supply and Demand Balance Through Machine Learning Doordash 2021
Finding the sweet spot Ocado 2021
Marketplace Forecasting: Sales or Demand? Why not both? Let’s find out! Mercado Libre 2021
Modeling the unseen Instacart 2019
Making cohort-based long-term forecasts at Lyft Lyft 2019
Predicting the real-time availability of 200 million grocery items Instacart 2018

Recommendation

Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper) Amazon 2003
Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2) Netflix 2012
How Music Recommendation Works — And Doesn’t Work Spotify 2012
Learning to Rank Recommendations with the k -Order Statistic Loss (Paper) Google 2013
Recommending Music on Spotify with Deep Learning Spotify 2014
Learning a Personalized Homepage Netflix 2015
The Netflix Recommender System: Algorithms, Business Value, and Innovation (Paper) Netflix 2015
Session-based Recommendations with Recurrent Neural Networks (Paper) Telefonica 2016
Deep Neural Networks for YouTube Recommendations YouTube 2016
E-commerce in Your Inbox: Product Recommendations at Scale (Paper) Yahoo 2016
To Be Continued: Helping you find shows to continue watching on Netflix Netflix 2016
Personalized Recommendations in LinkedIn Learning LinkedIn 2016
Personalized Channel Recommendations in Slack Slack 2016
Recommending Complementary Products in E-Commerce Push Notifications (Paper) Alibaba 2017
Artwork Personalization at Netflix Netflix 2017
A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper) Twitter 2017
Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper) Pinterest 2017
How 20th Century Fox uses ML to predict a movie audience (Paper) 20th Century Fox 2018
Calibrated Recommendations (Paper) Netflix 2018
Food Discovery with Uber Eats: Recommending for the Marketplace Uber 2018
Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper) Spotify 2018
Talent Search and Recommendation Systems at LinkedIn: Practical Challenges and Lessons Learned (Paper) LinkedIn 2018
Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper) Alibaba 2019
SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper) Alibaba 2019
Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper) Alibaba 2019
Personalized Recommendations for Experiences Using Deep Learning TripAdvisor 2019
Powered by AI: Instagram’s Explore recommender system Facebook 2019
Marginal Posterior Sampling for Slate Bandits (Paper) Netflix 2019
Music recommendation at Spotify Spotify 2019
Using Machine Learning to Predict what File you Need Next (Part 1) Dropbox 2019
Using Machine Learning to Predict what File you Need Next (Part 2) Dropbox 2019
Learning to be Relevant: Evolution of a Course Recommendation System paper LinkedIn 2019
Temporal-Contextual Recommendation in Real-Time (Paper) Amazon 2020
P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper) Amazon 2020
Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper) Alibaba 2020
TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper) Alibaba 2020
PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper) Alibaba 2020
Controllable Multi-Interest Framework for Recommendation (Paper) Alibaba 2020
MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper) Alibaba 2020
ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper) Alibaba 2020
For Your Ears Only: Personalizing Spotify Home with Machine Learning Spotify 2020
Reach for the Top: How Spotify Built Shortcuts in Just Six Months Spotify 2020
Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper) Spotify 2020
The Evolution of Kit: Automating Marketing Using Machine Learning Shopify 2020
A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) LinkedIn 2020
A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) LinkedIn 2020
Building a Heterogeneous Social Network Recommendation System LinkedIn 2020
How TikTok recommends videos #ForYou ByteDance 2020
Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper) Google 2020
Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper) Google 2020
Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper) Google 2020
Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper) Tencent 2020
A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper) Home Depot 2020
Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper) Ikea 2020
How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads Pinterest 2020
Multi-task Learning for Related Products Recommendations at Pinterest Pinterest 2020
Improving the Quality of Recommended Pins with Lightweight Ranking Pinterest 2020
Multi-task Learning and Calibration for Utility-based Home Feed Ranking Pinterest 2020
Personalized Cuisine Filter Based on Customer Preference and Local Popularity DoorDash 2020
How We Built a Matchmaking Algorithm to Cross-Sell Products Gojek 2020
Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper) Twitter 2021
Self-supervised Learning for Large-scale Item Recommendations (Paper) Google 2021
Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper) ByteDance 2021
Using AI to Help Health Experts Address the COVID-19 Pandemic Facebook 2021
Advertiser Recommendation Systems at Pinterest Pinterest 2021
On YouTube's Recommendation System YouTube 2021
"Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops Coveo 2021
Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training (Paper) Meta 2021
Personalized complementary product recommendation (Paper) Amazon 2022
Building a Deep Learning Based Retrieval System for Personalized Recommendations eBay 2022
How We Built: An Early-Stage Machine Learning Model for Recommendations Peloton 2022
Lessons Learned from Building out Context-Aware Recommender Systems Peloton 2022
Improving job matching with machine-learned activity features LinkedIn 2022
Blueprints for recommender system architectures: 10th anniversary edition Xavier Amatriain 2022
How Pinterest Leverages Realtime User Actions in Recommendation to Boost Homefeed Engagement Volume Pinterest 2022
RecSysOps: Best Practices for Operating a Large-Scale Recommender System Netflix 2022
Recommend API: Unified end-to-end machine learning infrastructure to generate recommendations Slack 2022
Evolving DoorDash’s Substitution Recommendations Algorithm DoorDash 2022
Homepage Recommendation with Exploitation and Exploration DoorDash 2022
GPU-accelerated ML Inference at Pinterest Pinterest 2022
Addressing Confounding Feature Issue for Causal Recommendation (Paper) Tencent 2022
Optimizing Airbnb Search Journey with Multi-task Learning
Recommendations and Results Organization in Netflix Search
Augmenting Netflix Search with In-Session Adapted Recommendations
Human Curation and Convnets: Powering Item-to-Item Recommendations on Pinterest
POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion
Context-aware Deep Model for Entity Recommendation in Search Engine at Alibaba
Amazon Books Rating prediction & Recommendation Model
Building a mind reader at Swiggy using Data Science Swiggy 2023
Menu Ranking Foodpanda 2023
The Recommendation System at Lyft Lyft 2023
How We Built a Multi-Task Canonical Ranker for Recommendations at Etsy Etsy 2023
Optimising marketing messages for Monzo users Monzo 2023
Twitter's Recommendation Algorithm Twitter 2023
How We Automated Content Marketing to Acquire Users at Scale Spotify 2023
Enhancing homepage feed relevance by harnessing the power of large corpus sparse ID embeddings Linkedin 2023
Let AI Entertain You: Increasing User Engagement with Generative AI and Rejection Sampling Nextdoor 2023
Recommender systems need a user model Criteo 2023
The Next Step in Personalization: Dynamic Sizzles Netflix 2023
Using Contextual Bandit models in large action spaces at Instacart Instacart 2023
Training Foundation Improvements for Closeup Recommendation Ranker Pinterest 2023
Spotify Track Neural Recommender System Spotify 2023
Leveraging Real-Time User Actions to Personalize Etsy Ads Etsy 2023
Reinvent your recommender system using Vector Database and Opinion Mining Dailymotion 2023
How The New York Times Cooking Team Makes Personalized Recipe Recommendations New York Times 2023
Generating Diverse Travel Recommendations Expedia 2023
Personalisation @ Delivery Hero: Understanding Customers Delivery Hero 2023
Lessons Learnt From Consolidating ML Models in a Large Scale Recommendation System Netflix 2023
Increasing Travelers’ Engagement Through Price Alerts Expedia 2023
Stepping up marketing for advertisers: Scalable lookalike audience Grab 2023
Griffin: How Wayfair Leverages Reinforcement Learning to Send Customers Relevant Communications Wayfair 2023
Scaling the Instagram Explore recommendations system Meta 2023
Personalisation @ Delivery Hero: Ranking restaurants for new users Delivery Hero 2023
On the Diversity and Explainability of Enterprise App Recommendation Systems Salesforce 2023
Scaling marketing for merchants with targeted and intelligent promos Grab 2023
Don’t Worry, We Got You: Personalised Model Delivery Hero 2023
Experimenting with Machine Learning to Target In-App Messaging Spotify 2023
Multi-Relevance Ranking Model for Similar Item Recommendation Ebay 2022
Beyond Matrix Factorization: Using hybrid features for user-business recommendations Yelp 2022
Gousto R-series Vol 2: Tackling the Cold-Start Problem in Recipe Recommendation Engine Gousto 2022
How Uber Optimizes the Timing of Push Notifications using ML and Linear Programming Uber 2022
Improving Instagram notification management with machine learning and causal inference Meta 2022
Personalizing Recommendations for a Learning User Instacart 2022
The Amazon Music conversational recommender is hitting the right notes Amazon 2022
Machine Learning for Snapchat Ad Ranking Snap 2022
How The New York Times Uses Machine Learning To Make Its Paywall Smarter New York Times 2022
Challenges and practical lessons from building a deep-learning-based ads CTR prediction model Linkedin 2022
Reinforcement Learning for Budget Constrained Recommendations Netflix 2022
Client Time Series Model: a Multi-Target Recommender System based on Temporally-Masked Encoders Stitch Fix 2022
Scaling Product Recommendations using Basket Analysis- Part 1 Walmart 2022
Model-based candidate generation for account recommendations Twitter 2022
Nightingale: Scalable Daily Sales Email Sending Decision Model Wayfair 2022
Personalized Fishbowl Recommendations with Learned Embeddings: Part 2 Glassdoor 2022
Personalized Fishbowl Recommendations with Learned Embeddings: Part 1 Glassdoor 2022
Optimizing video feed recommendations with diversity: Machine Learning first steps Dailymotion 2022
Building A Recipe Recommender System For the Thermomix on Cookidoo – Part 1 Cookidoo 2022
Item2Vec: Neural Item Embeddings to enhance recommendations OLX 2021
Algorithm-Assisted Inventory Curation Stitch Fix 2021
Gousto R-series vol 1: Three tales of the Rouxcommender family Gousto 2021
The machine learning behind delivering relevant ads Pinterest 2021
Nextdoor Notifications: How we use ML to keep neighbors informed Nextdoor 2021
Interpretable Adaptive Optimization Apple 2021
The Rise (and Lessons Learned) of ML Models to Personalize Content on Home (Part I) Spotify 2021
Mozrt, a Deep Learning Recommendation System Empowering Walmart Store Associates with a Personalized Learning Experience Walmart 2021
Machine Learning and Reader Input Help Us Recommend Articles New York Times 2021
Evolution of Ads Bidding at Wayfair Wayfair 2021
Embedding-based Retrieval at Scribd Scribd 2021
MARS: Transformer Networks for Sequential Recommendation Wayfair 2021
The Rise (and Lessons Learned) of ML Models to Personalize Content on Home (Part II) Spotify 2021
How machine learning powers Facebook’s News Feed ranking Meta 2021
How Deep Learning can boost Contextual Advertising Capabilities Dailymotion 2021
Share of Voice Optimization Engine Wayfair 2021
Contextual Bandit for Marketing Treatment Optimization Wayfair 2021
How we built the good first issues feature Github 2020
How Lyft predicts a rider’s destination for better in-app experience Lyft 2020
Using machine learning to predict the value of ad requests Twitter 2020
Deep Reinforcement Learning in Production Part 2: Personalizing User Notifications Zynga 2020
Using machine learning to predict what file you need next Dropbox 2019
Building Lyft’s Marketing Automation Platform Lyft 2019
Empowering personalized marketing with machine learning Lyft 2018

Search & Ranking

Amazon Search: The Joy of Ranking Products (Paper, Video, Code) Amazon 2016
How Lazada Ranks Products to Improve Customer Experience and Conversion Lazada 2016
Ranking Relevance in Yahoo Search (Paper) Yahoo 2016
Learning to Rank Personalized Search Results in Professional Networks (Paper) LinkedIn 2016
Using Deep Learning at Scale in Twitter’s Timelines Twitter 2017
An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper) Etsy 2017
Powering Search & Recommendations at DoorDash DoorDash 2017
In-session Personalization for Talent Search (Paper) LinkedIn 2018
Food Discovery with Uber Eats: Building a Query Understanding Engine Uber 2018
Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper) Alibaba 2018
Semantic Product Search (Paper) Amazon 2019
Machine Learning-Powered Search Ranking of Airbnb Experiences Airbnb 2019
Entity Personalized Talent Search Models with Tree Interaction Features (Paper) LinkedIn 2019
The AI Behind LinkedIn Recruiter Search and recommendation systems LinkedIn 2019
Learning Hiring Preferences: The AI Behind LinkedIn Jobs LinkedIn 2019
Neural Code Search: ML-based Code Search Using Natural Language Queries Facebook 2019
Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper) Alibaba 2019
Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search Alibaba 2019
Understanding Searches Better Than Ever Before (Paper) Google 2019
How We Used Semantic Search to Make Our Search 10x Smarter Tokopedia 2019
Query2vec: Search query expansion with query embeddings GrubHub 2019
MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored Search Baidu 2019
Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper) Amazon 2020
Managing Diversity in Airbnb Search (Paper) Airbnb 2020
Improving Deep Learning for Airbnb Search (Paper) Airbnb 2020
Quality Matches Via Personalized AI for Hirer and Seeker Preferences LinkedIn 2020
Understanding Dwell Time to Improve LinkedIn Feed Ranking LinkedIn 2020
Ads Allocation in Feed via Constrained Optimization (Paper, Video) LinkedIn 2020
AI at Scale in Bing Microsoft 2020
Query Understanding Engine in Traveloka Universal Search Traveloka 2020
Bayesian Product Ranking at Wayfair Wayfair 2020
COLD: Towards the Next Generation of Pre-Ranking System (Paper) Alibaba 2020
Driving Shopping Upsells from Pinterest Search Pinterest 2020
GDMix: A Deep Ranking Personalization Framework (Code) LinkedIn 2020
Bringing Personalized Search to Etsy Etsy 2020
Building a Better Search Engine for Semantic Scholar Allen Institute for AI 2020
Query Understanding for Natural Language Enterprise Search (Paper) Salesforce 2020
Query Understanding for Surfacing Under-served Music Content (Paper) Spotify 2020
Embedding-based Retrieval in Facebook Search (Paper) Facebook 2020
Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper) JD 2020
QUEEN: Neural query rewriting in e-commerce (Paper) Amazon 2021
Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper) Amazon 2021
Seasonal relevance in e-commerce search (Paper) Amazon 2021
Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) Alibaba 2021
How We Built A Context-Specific Bidding System for Etsy Ads Etsy 2021
Pre-trained Language Model based Ranking in Baidu Search (Paper) Baidu 2021
Deep Natural Language Processing for LinkedIn Search Systems (Paper) LinkedIn 2021
Siamese BERT-based Model for Web Search Relevance Ranking (Paper, Code) Seznam 2021
SearchSage: Learning Search Query Representations at Pinterest Pinterest 2021
Query2Prod2Vec: Grounded Word Embeddings for eCommerce Coveo 2021
3 Changes to Expand DoorDash’s Product Search Beyond Delivery DoorDash 2022
Learning To Rank Diversely Airbnb 2022
How to Optimise Rankings with Cascade Bandits Expedia 2022
A Guide to Google Search Ranking Systems Google 2022
Deep Learning for Search Ranking at Etsy Etsy 2022
Search at Calm Calm 2022
Personalized Transformer-based Ranking for e-Commerce at Yandex
Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale
Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace
Fuzzy Substring Matching: On-device Fuzzy Friend Search at Snapchat
Applying Deep Learning To Airbnb Search
An Embedding-Based Grocery Search Model at Instacart
Challenges in Search on Streaming Services: Netflix Case Study
Visual Discovery at Pinterest
Demystifying Core Ranking in Pinterest Image Search
Rethinking Personalized Ranking at Pinterest: An End-to-End Approach
TransAct: Transformer-based Realtime User Action Model for Recommendation at Pinterest
Visual Search at Alibaba
Personalized Expertise Search at LinkedIn
Search by Ideal Candidates: Next Generation of Talent Search at LinkedIn
LiRank: Industrial Large Scale Ranking Models at LinkedIn
Prioritizing Home Attributes Based on Guest Interest Airbnb 2023
Building Airbnb Categories with ML & Human in the Loop Airbnb 2023
Feature Spotlight: Query Suggestions Algolia 2023
Building In-Video Search Netflix 2023
Resolve Cases Quickly with Interactive Einstein Search Answers Salesforce 2023
Swiggy’s Generative AI Journey: A Peek Into the Future Swiggy 2023
From Image Classification to Multitask Modeling: Building Etsy’s Search by Image Feature Etsy 2023
How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers Linkedin 2023
How Instacart Uses Embeddings to Improve Search Relevance Instacart 2022
Introducing Natural Language Search for Podcast Episodes Spotify 2022
Explore-exploit dilemma in Ranking model Trivago 2022
Improving Post Search at LinkedIn Linkedin 2022
How Instacart Uses Machine Learning-Driven Autocomplete to Help People Fill Their Carts Instacart 2022
Real-Time Personalisation of Search Results with Auto Trader's Customer Data Platform Autotrader 2022
Real-time ranking at Faire part 2: the feature store Faire 2022
Wayfair’s New Approach to Aspect Based Sentiment Analysis Helps Customers Easily Find “Long Tail” Products Wayfair 2022
Building Faire’s new marketplace ranking infrastructure Faire 2021
How image search works at Dropbox Dropbox 2021
Learning To Rank Restaurants Swiggy 2021
Using Deep Learning for Ranking in Dish Search Swiggy 2021
Personalized Ranking Model for Lodging Expedia 2021
Improving Deep Learning for Ranking Stays at Airbnb Airbnb 2020
Bayesian Product Ranking at Wayfair Wayfair 2020
Guided Search — Personalized Search Refinements to Help Customers Find their Dream Home Zillow 2020
Things Not Strings: Understanding Search Intent with Better Recall Doordash 2020
The Secret Sauce Behind Search Personalisation Gojek 2019
Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want Stitch Fix 2019
Is This What You Were Looking For? Gojek 2019

Embeddings

Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper) Sears 2017
Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper) Alibaba 2018
Embeddings@Twitter Twitter 2018
Listing Embeddings in Search Ranking (Paper) Airbnb 2018
Understanding Latent Style Stitch Fix 2018
Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper) LinkedIn 2018
Personalized Store Feed with Vector Embeddings DoorDash 2018
Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper) Moshbit 2019
Machine Learning for a Better Developer Experience Netflix 2020
Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code) Google 2020
BERT Goes Shopping: Comparing Distributional Models for Product Representations Coveo 2021
The Embeddings That Came in From the Cold: Improving Vectors for New and Rare Products with Content-Based Inference Coveo 2022
Multi-objective Hyper-parameter Optimization of Behavioral Song Embeddings (Paper) Apple 2022
Embeddings at Spotify's Scale - How Hard Could It Be? Spotify 2023
TIES: Temporal Interaction Embeddings For Enhancing Social Media Integrity At Facebook
PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest
ItemSage: Learning Product Embeddings for Shopping Recommendations at Pinterest
A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent
Binary Embedding-based Retrieval at Tencent

Natural Language Processing

Abusive Language Detection in Online User Content (Paper) Yahoo 2016
Smart Reply: Automated Response Suggestion for Email (Paper) Google 2016
Building Smart Replies for Member Messages LinkedIn 2017
How Natural Language Processing Helps LinkedIn Members Get Support Easily LinkedIn 2019
Gmail Smart Compose: Real-Time Assisted Writing (Paper) Google 2019
Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper) Amazon 2019
DeText: A deep NLP Framework for Intelligent Text Understanding (Code) LinkedIn 2020
SmartReply for YouTube Creators Google 2020
Using Neural Networks to Find Answers in Tables (Paper) Google 2020
A Scalable Approach to Reducing Gender Bias in Google Translate Google 2020
Assistive AI Makes Replying Easier Microsoft 2020
AI Advances to Better Detect Hate Speech Facebook 2020
A State-of-the-Art Open Source Chatbot (Paper) Facebook 2020
A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs Facebook 2020
Deep Learning to Translate Between Programming Languages (Paper, Code) Facebook 2020
Deploying Lifelong Open-Domain Dialogue Learning (Paper) Facebook 2020
Introducing Dynabench: Rethinking the way we benchmark AI Facebook 2020
The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper) Baidu 2020
PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code) Google 2020
Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo) Salesforce 2020
GeDi: A Powerful New Method for Controlling Language Models (Paper, Code) Salesforce 2020
Applying Topic Modeling to Improve Call Center Operations RICOH 2020
WIDeText: A Multimodal Deep Learning Framework Airbnb 2020
Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code) Facebook 2021
How we reduced our text similarity runtime by 99.96% Microsoft 2021
Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models) Facebook 2021
Grammar Correction as You Type, on Pixel 6 Google 2021
Auto-generated Summaries in Google Docs Google 2022
ML-Enhanced Code Completion Improves Developer Productivity Google 2022
Words All the Way Down — Conversational Sentiment Analysis PayPal 2022
Query Processing at Snapchat: How we Handle Query Completion, Suggestion and Localization
Toward a full-scale neural machine translation in production: the Booking.com use case
Improving Query Safety at Pinterest
A New Era of Creativity: Expert-in-the-loop Generative AI at Stitch Fix Stitch Fix 2023
All the Hard Stuff Nobody Talks About when Building Products with LLMs Honeycomb 2023
Using topic modelling to understand customer saving goals Monzo 2023
Improving the Performance of NLP Systems on the Gender-Neutral “They” Grammarly 2023
Bringing the world closer together with a foundational multimodal model for speech translation Meta 2023
A Unified Multi-task Model for Supporting Multiple Virtual Assistants in Walmart Walmart 2022
Using predictive technology to foster constructive conversations Nextdoor 2022
Categorising Customer Feedback Using Unsupervised Learning Expedia 2022
Conversation Summaries in Google Chat Google 2022
Under the Hood of the Grammarly Editor, Part Two: How Suggestions Work Grammarly 2022
Helping Home Shoppers Find a Home to Love Through Home Insights Zillow 2022
Incorporating Listing Descriptions into the Zestimate Zillow 2022
Innovating the Basics: Achieving Superior Precision and Recall in Grammatical Error Correction Grammarly 2022
Building Wayfair’s First Virtual Assistant: Automating Customer Service by Text Based Intent Prediction Wayfair 2022
Information Extraction at Scribd Scribd 2021
Multilingual message content moderation at scale (part 2) Bumble 2021
ATTN: How Grammarly’s NLP/ML Team Figured Out Where Readers Focus in an Email Grammarly 2021
How Pinterest powers a healthy comment ecosystem with machine learning Pinterest 2021
Grammatical Error Correction: Tag, Not Rewrite Grammarly 2021
Improving Recommendation Quality by Tapping into Listing Text Zillow 2021
Multilingual message content moderation at scale (part 1) Bumble 2021
Adversarial Grammatical Error Correction Grammarly 2021
How we used Cross-Lingual Transfer Learning to categorize our content Dailymotion 2020
How Duolingo uses AI in every part of its app Duolingo 2020
Language Identification from Very Short Strings Apple 2019

Sequence Modelling

Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper) Sutter Health 2015
Deep Learning for Understanding Consumer Histories (Paper) Zalando 2016
Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper) Sutter Health 2016
Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper) Telefonica 2017
Deep Learning for Electronic Health Records (Paper) Google 2018
Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)Alibaba 2019
Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper) Alibaba 2020
Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video) Facebook 2020
Using deep learning to detect abusive sequences of member activity (Video) LinkedIn 2021
Soft Frequency Capping for Improved Ad Click Prediction in Yahoo Gemini Native
PinnerFormer: Sequence Modeling for User Representation at Pinterest

Computer Vision

Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning Dropbox 2017
Categorizing Listing Photos at Airbnb Airbnb 2018
Amenity Detection and Beyond — New Frontiers of Computer Vision at Airbnb Airbnb 2019
How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors Deepomatic
Making machines recognize and transcribe conversations in meetings using audio and video Microsoft 2019
Powered by AI: Advancing product understanding and building new shopping experiences Facebook 2020
A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper) Google 2020
Machine Learning-based Damage Assessment for Disaster Relief (Paper) Google 2020
RepNet: Counting Repetitions in Videos (Paper) Google 2020
Converting Text to Images for Product Discovery (Paper) Amazon 2020
How Disney Uses PyTorch for Animated Character Recognition Disney 2020
Image Captioning as an Assistive Technology (Video) IBM 2020
AI for AG: Production machine learning for agriculture Blue River 2020
AI for Full-Self Driving at Tesla Tesla 2020
On-device Supermarket Product Recognition Google 2020
Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper) Google 2020
Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) Pinterest 2020
Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper) Google 2020
Vision-based Price Suggestion for Online Second-hand Items (Paper) Alibaba 2020
New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model) Facebook 2021
An Efficient Training Approach for Very Large Scale Face Recognition (Paper) Alibaba 2021
Identifying Document Types at Scribd Scribd 2021
Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper) Walmart 2021
Recognizing People in Photos Through Private On-Device Machine Learning Apple 2021
DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection Google 2022
Contrastive language and vision learning of general fashion concepts (Paper)Coveo 2022
Leveraging Computer Vision for Search Ranking BazaarVoice 2023
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Learning a Unified Embedding for Visual Search at Pinterest
Bootstrapping Complete The Look at Pinterest
Virtual ID Discovery from E-commerce Media at Alibaba: Exploiting Richness of User Click Behavior for Visual Search Relevance
Personalized ‘Complete the Look’ model Walmart 2023
Yelp Content As Embeddings Yelp 2023
Fast Class-Agnostic Salient Object Segmentation Apple 2023
Uber’s Real-Time Document Check Uber 2022
A snapshot of AI-powered reminiscing in Google Photos Google 2021
Stitching together spaces for query-based recommendations Stitch Fix 2021
Using Rich Image and Text Data to Categorize Products at Scale Shopify 2021
AI-Created Outfits Nordstrom 2021
Image detection as a service Bumble 2020
Zillow Floor Plan: Training Models to Detect Windows, Doors and Openings in Panoramas Zillow 2020
The Visual Complements Model (ViCs): Complementary Product Recommendations From Visual Cues Wayfair 2020

Reinforcement Learning

Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper) Alibaba 2018
Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper) Alibaba 2018
Reinforcement Learning for On-Demand Logistics DoorDash 2018
Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) Alibaba 2018
Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper) Alibaba 2019
Productionizing Deep Reinforcement Learning with Spark and MLflow Zynga 2020
Deep Reinforcement Learning in Production Part1 Part 2 Zynga 2020
Building AI Trading Systems Denny Britz 2020
Shifting Consumption towards Diverse content via Reinforcement Learning (Paper) Spotify 2022
Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms Meta 2022
Selecting the Best Image for Each Merchant Using Exploration and Machine Learning DoorDash 2023
A Better Match for Drivers and Riders: Reinforcement Learning at Lyft

Anomaly Detection

Detecting Performance Anomalies in External Firmware Deployments Netflix 2019
Detecting and Preventing Abuse on LinkedIn using Isolation Forests video (Code) LinkedIn 2019
Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video) Swedbank, Hopsworks 2019
The Technology Behind Fighting Harassment on LinkedIn LinkedIn 2020
Uncovering Insurance Fraud Conspiracy with Network Learning (Paper) Ant Financial 2020
How Does Spam Protection Work on Stack Exchange? Stack Exchange 2020
Auto Content Moderation in C2C e-Commerce Mercari 2020
Blocking Slack Invite Spam With Machine Learning Slack 2020
Cloudflare Bot Management: Machine Learning and More Cloudflare 2020
Anomalies in Oil Temperature Variations in a Tunnel Boring Machine SENER 2020
Using Anomaly Detection to Monitor Low-Risk Bank Customers Rabobank 2020
Fighting fraud with Triplet Loss OLX Group 2020
Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative) Facebook 2020
❌ How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4 Facebook 2020
Project RADAR: Intelligent Early Fraud Detection System with Humans in the Loop Uber 2022
Graph for Fraud Detection Grab 2022
Evolving our machine learning to stop mobile bots Cloudflare 2022
Improving the accuracy of our machine learning WAF using data augmentation and sampling Cloudflare 2022
Machine Learning for Fraud Detection in Streaming Services Netflix 2022
Pricing at Lyft Lyft 2022

Graph

Building The LinkedIn Knowledge Graph LinkedIn 2016
Scaling Knowledge Access and Retrieval at Airbnb Airbnb 2018
Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)Pinterest 2018
Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations Uber 2019
AliGraph: A Comprehensive Graph Neural Network Platform (Paper) Alibaba 2019
Contextualizing Airbnb by Building Knowledge Graph Airbnb 2019
Retail Graph — Walmart’s Product Knowledge Graph Walmart 2020
Traffic Prediction with Advanced Graph Neural Networks DeepMind 2020
SimClusters: Community-Based Representations for Recommendations (Paper, Video) Twitter 2020
Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper) Alibaba 2021
JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper) JPMorgan Chase 2021
How AWS uses graph neural networks to meet customer needs Amazon 2022
Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph
MultiBiSage: A Web-Scale Recommendation System Using Multiple Bipartite Graphs at Pinterest
AliCG: Fine-grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba
LiGNN: Graph Neural Networks at LinkedIn

Optimization

Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3) Lyft 2016
The Data and Science behind GrabShare Carpooling paper (Part 1) Grab 2017
How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats Uber 2018
Next-Generation Optimization for Dasher Dispatch at DoorDash DoorDash 2020
Optimization of Passengers Waiting Time in Elevators Using Machine Learning Thyssen Krupp AG 2020
Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper) Amazon 2020
Optimizing DoorDash’s Marketing Spend with Machine Learning DoorDash 2020
Revenue Maximization of Airbnb Marketplace using Search Results
Business Policy Experiments using Fractional Factorial Designs: Consumer Retention on DoorDash
Truncation-Free Matching System for Display Advertising at Alibaba
Bidding Agent Design in the LinkedIn Ad Marketplace
Sales Channel Optimization via Simulations Based on Observational Data with Delayed Rewards: A Case Study at LinkedIn
Large-language models for automatic cloud incident management Microsoft 2023
Building the Neural Zestimate Zillow 2023
Improving the customer’s experience via ML-driven payment routing Linkedin 2023
Using Synthetic Search Data for Flights Price Forecasting Expedia 2023
Accelerating AI: Implementing Multi-GPU Distributed Training for Personalized Recommendations Stitch Fix 2023
Exploring an Entity Resolution Framework Across Various Use Cases Walmart 2023
Is this a date? Using ML to identify date formats in file names Dropbox 2023
Predicting package dimensions based on a similarity model at Mercado Libre Mercado Libre 2022
Leveraging machine learning to find security vulnerabilities Github 2022
How We Built Infrastructure to Run User Forecasts at Spotify Spotify 2022
How AI Text Generation Models Are Reshaping Customer Support at Airbnb Airbnb 2022
Forecast Anomalies in Refrigeration with PySpark & Sensor-data Walmart 2022
Pricing at Lyft Lyft 2022
Using deep learning to detect dissonance between address text and location Swiggy 2022
Didact AI: The anatomy of an ML-powered stock picking engine Didact AI 2022
How we went from zero insight to predicting service time with a machine learning model — Part 2/2 Oda 2022
For your eyes only: improving Netflix video quality with neural networks Netflix 2022
Using Machine Learning for Fast Test Feedback to Developers and Test Suite Optimization Siemens Healthineers 2022
Applying multitask learning to AI models at LinkedIn Linkedin 2022
ML and customer support (Part 1): Using Machine Learning to enable world-class customer support Microsoft 2021
Forecasting SQL query resource usage with machine learning Twitter 2021
Applying Machine Learning in Internal Audit with Sparsely Labeled Data Uber 2021
ML and customer support (Part 2): Leveraging topic modeling to identify the top investment areas in support cases Microsoft 2021
Predicting Hard Drive Failure with Machine Learning Datto 2021
Optimizing payments with machine learning Dropbox 2021
How DoorDash Quickly Spins Up Multiple Image Recognition Use Cases Doordash 2021
Automating Data Protection at Scale, Part 2 Airbnb 2021
Automated detection, diagnosis & remediation of app failure Capital One 2021
Predicting Defrost in Refrigeration Cases at Walmart using Fourier Transform Walmart 2021
Using Machine Learning to Improve Payment Authorization Rate PayPal 2021
Using ML and Optimization to Solve DoorDash’s Dispatch Problem Doordash 2021
Learning to Predict Two-Wheeler Travel Distance Swiggy 2021
How we used ML — and heuristic data labeling — to help customers with their cloud migration Microsoft 2021
How Gojek Uses NLP to Name Pickup Locations at Scale Gojek 2020
Testing Firefox more efficiently with machine learning Mozilla 2020
Optimizing payment conversion rates with contextual multi-armed bandits Adyen 2020
Detecting Stop Signs and Traffic Signals: Deep Learning at Lyft Mapping Lyft 2019
How Lyft Creates Hyper-Accurate Maps from Open-Source Maps and Real-Time Data Lyft 2019
Human-Like Playtesting with Deep Learning King 2019
Using Machine Learning to Improve Streaming Quality at Netflix Netflix 2018
Space, Time and Groceries Instacart 2017

Information Extraction

Unsupervised Extraction of Attributes and Their Values from Product Description (Paper) Rakuten 2013
Using Machine Learning to Index Text from Billions of Images Dropbox 2018
Extracting Structured Data from Templatic Documents (Paper) Google 2020
AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video) Amazon 2020
One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper) Alibaba 2020
Information Extraction from Receipts with Graph Convolutional Networks Nanonets 2021
Producing Usable Taxonomies Cheaply and Rapidly at Pinterest Using Discovered Dynamic μ-Topics
Deep Job Understanding at LinkedIn
Hamlet: Wayfair's ML Approach to Identifying Business Shopper Wayfair 2023
Ocelot: Scaling observational causal inference at LinkedIn Linkedin 2022
A Survey of Causal Inference Applications at Netflix Netflix 2022
Beyond prediction machines Nubank 2021
Causal Inference — Estimating Long-term Engagement Mercado Libre 2021
Modeling Uplift Directly: Uplift Decision Tree with KL Divergence and Euclidean Distance as Splitting Criteria Wayfair 2019

Weak Supervision

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper) Google 2019
Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper) Intel 2019
Bootstrapping Conversational Agents with Weak Supervision (Paper) IBM 2019
HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Generation

Better Language Models and Their Implications (Paper)OpenAI 2019
Image GPT (Paper, Code) OpenAI 2019
Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post) OpenAI 2020
Deep Learned Super Resolution for Feature Film Production (Paper) Pixar 2020
Unit Test Case Generation with Transformers Microsoft 2021
Observation-based unit test generation at Meta
Automated Unit Test Improvement using Large Language Models at Meta
Inside GitHub: Working with the LLMs behind GitHub Copilot GitHub 2023
LLM-powered data classification for data entities at scale Grab 2023
Introducing Code Llama, a state-of-the-art large language model for coding Meta 2023
How to build an enterprise LLM application: Lessons from GitHub Copilot GitHub 2023
DoorDash identifies Five big areas for using Generative AI Doordash 2023
Large-Scale Generation of ML Podcast Previews at Spotify with Google Dataflow Spotify 2023
Building Boba AI Thoughtworks 2023
How Whatnot Utilizes Generative AI to Enhance Trust and Safety Whatnot 2023
AI Summarist: Get Your Time Back on Slack, Boost Productivity & Focus, Personalize Information Consumption Salesforce 2023
Generative AI-enabled compliance for software development GitHub 2023
Scaling Productivity with Ava — Instacart’s Internal AI Assistant Instacart 2023
From idea to reality: Elevating our customer support through generative AI Vimeo 2023
Improving Virtual Card Numbers with Edge Machine Learning Capital One 2021

Audio

Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)Google 2020
The Machine Learning Behind Hum to Search Google 2020
Detecting Speech and Music in Audio Content Netflix 2023
Voice Reorder Experience: add Multiple Product Items to your shopping cart Walmart 2022

Privacy-Preserving Machine Learning

Federated Learning: Collaborative Machine Learning without Centralized Training Data (Paper) Google 2017
Federated Learning with Formal Differential Privacy Guarantees (Paper) Google 2022
MPC-based machine learning: Achieving end-to-end privacy-preserving machine learning (Paper) Facebook 2022
Personalized Federated Search at LinkedIn
LinkedIn Salary: A System for Secure Collection and Presentation of Structured Compensation Insights to Job Seekers
PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn
LinkedIn's Audience Engagements API: A Privacy Preserving Data Analytics System at Scale

Validation and A/B Testing

Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper) Google 2010
The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper) Google 2015
Twitter Experimentation: Technical Overview Twitter 2015
It’s All A/Bout Testing: The Netflix Experimentation Platform Netflix 2016
Building Pinterest’s A/B Testing Platform Pinterest 2016
Experimenting to Solve Cramming Twitter 2017
Building an Intelligent Experimentation Platform with Uber Engineering Uber 2017
Scaling Airbnb’s Experimentation Platform Airbnb 2017
Meet Wasabi, an Open Source A/B Testing Platform (Code) Intuit 2017
Analyzing Experiment Outcomes: Beyond Average Treatment Effects Uber 2018
Under the Hood of Uber’s Experimentation Platform Uber 2018
Constrained Bayesian Optimization with Noisy Experiments (Paper) Facebook 2018
Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab Grab 2018
Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code) Better 2019
Detecting Interference: An A/B Test of A/B Tests LinkedIn 2019
Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper) Uber 2020
Enabling 10x More Experiments with Traveloka Experiment Platform Traveloka 2020
Large Scale Experimentation at Stitch Fix (Paper) Stitch Fix 2020
Multi-Armed Bandits and the Stitch Fix Experimentation Platform Stitch Fix 2020
Experimentation with Resource Constraints Stitch Fix 2020
Computational Causal Inference at Netflix (Paper) Netflix 2020
Key Challenges with Quasi Experiments at Netflix Netflix 2020
Making the LinkedIn experimentation engine 20x faster LinkedIn 2020
Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn LinkedIn 2020
How to Use Quasi-experiments and Counterfactuals to Build Great Products Shopify 2020
Improving Experimental Power through Control Using Predictions as Covariate DoorDash 2020
Supporting Rapid Product Iteration with an Experimentation Analysis Platform DoorDash 2020
Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity DoorDash 2020
Leveraging Causal Modeling to Get More Value from Flat Experiment Results DoorDash 2020
Iterating Real-time Assignment Algorithms Through Experimentation DoorDash 2020
Spotify’s New Experimentation Platform (Part 1) (Part 2) Spotify 2020
Interpreting A/B Test Results: False Positives and Statistical Significance Netflix 2021
Interpreting A/B Test Results: False Negatives and Power Netflix 2021
Running Experiments with Google Adwords for Campaign Optimization DoorDash 2021
The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000% DoorDash 2021
Experimentation Platform at Zalando: Part 1 - Evolution Zalando 2021
Designing Experimentation Guardrails Airbnb 2021
How Airbnb Measures Future Value to Standardize Tradeoffs Airbnb 2021
Network Experimentation at Scale(Paper] Facebook 2021
Universal Holdout Groups at Disney Streaming Disney 2021
Experimentation is a major focus of Data Science across Netflix Netflix 2022
Search Journey Towards Better Experimentation Practices Spotify 2022
Artificial Counterfactual Estimation: Machine Learning-Based Causal Inference at Airbnb Airbnb 2022
Beyond A/B Test : Speeding up Airbnb Search Ranking Experimentation through Interleaving Airbnb 2022
Challenges in Experimentation Lyft 2022
Overtracking and Trigger Analysis: Reducing sample sizes while INCREASING sensitivity Booking 2022
Meet Dash-AB — The Statistics Engine of Experimentation at DoorDash DoorDash 2022
Comparing quantiles at scale in online A/B-testing Spotify 2022
Accelerating our A/B experiments with machine learning Dropbox 2023
Supercharging A/B Testing at Uber Uber
Democratizing online controlled experiments at Booking.com
Mediation Analysis in Online Experiments at Booking.com: Disentangling Direct and Indirect Effects

Model Management

Operationalizing Machine Learning—Managing Provenance from Raw Data to Predictions Comcast 2018
Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) Apple 2019
Runway - Model Lifecycle Management at Netflix Netflix 2020
Managing ML Models @ Scale - Intuit’s ML Platform Intuit 2020
ML Model Monitoring - 9 Tips From the Trenches Nubank 2021
Dealing with Train-serve Skew in Real-time ML Models: A Short Guide Nubank 2023
Adversarial Validation Approach to Concept Drift Problem in User Targeting Automation Systems at Uber

Efficiency

GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper) Facebook 2020
How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs Roblox 2020
Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper) Uber 2021

Ethics

Building Inclusive Products Through A/B Testing (Paper) LinkedIn 2020
LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper) LinkedIn 2020
Introducing Twitter’s first algorithmic bias bounty challenge Twitter 2021
Examining algorithmic amplification of political content on Twitter Twitter 2021
A closer look at how LinkedIn integrates fairness into its AI products LinkedIn 2022
Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search
Disentangling and Operationalizing AI Fairness at LinkedIn

Infra

Reengineering Facebook AI’s Deep Learning Platforms for Interoperability Facebook 2020
Elastic Distributed Training with XGBoost on Ray Uber 2021

MLOps Platforms

Meet Michelangelo: Uber’s Machine Learning Platform Uber 2017
Big Data Machine Learning Platform at Pinterest Pinterest 2019
Core Modeling at Instagram Instagram 2019
Open-Sourcing Metaflow - a Human-Centric Framework for Data Science Netflix 2019
Real-time Machine Learning Inference Platform at Zomato Zomato 2020
Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform Lyft 2020
Building Flexible Ensemble ML Models with a Computational Graph DoorDash 2021
LyftLearn: ML Model Training Infrastructure built on Kubernetes Lyft 2021
"You Don't Need a Bigger Boat": A Full Data Pipeline Built with Open-Source Tools (Paper) Coveo 2021
MLOps at GreenSteam: Shipping Machine Learning GreenSteam 2021
Evolving Reddit’s ML Model Deployment and Serving Architecture Reddit 2021
Redesigning Etsy’s Machine Learning Platform Etsy 2021
Building a Platform for Serving Recommendations at Etsy Etsy 2022
Intelligent Automation Platform: Empowering Conversational AI and Beyond at Airbnb Airbnb 2022
DARWIN: Data Science and Artificial Intelligence Workbench at LinkedIn LinkedIn 2022
The Magic of Merlin: Shopify's New Machine Learning Platform Shopify 2022
Zalando's Machine Learning Platform Zalando 2022
Inside Meta's AI optimization platform for engineers across the company (Paper) Meta 2022
Monzo’s machine learning stack Monzo 2022
Evolution of ML Fact Store Netflix 2022
Using MLOps to Build a Real-time End-to-End Machine Learning Pipeline Binance 2022
Serving Machine Learning Models Efficiently at Scale at Zillow Zillow 2022
Deployment for Free - A Machine Learning Platform for Stitch Fix's Data Scientists Stitch Fix 2022
Machine Learning Operations (MLOps): Overview, Definition, and Architecture (Paper) IBM 2022
FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute
Amazon SageMaker Model Monitor: A System for Real-Time Insights into Deployed Machine Learning Models
AlerTiger: Deep Learning for AI Model Health Monitoring at LinkedIn

Practices

Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper) Yoshua Bengio 2012
Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper) Google 2014
Good Data Analysis Google 2019
Rules of Machine Learning: Best Practices for ML Engineering Google 2018
On Challenges in Machine Learning Model Management Amazon 2018
Machine Learning in Production: The Booking.com Approach Booking 2019
150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper) Booking 2019
Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank Rabobank 2019
Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper) Cambridge 2020
The problem with AI developer tools for enterprises Databricks 2020
Continuous Integration and Deployment for Machine Learning Online Serving and Models Uber 2021
Tuning Model Performance Uber 2021
Maintaining Machine Learning Model Accuracy Through Monitoring DoorDash 2021
Building Scalable and Performant Marketing ML Systems at Wayfair Wayfair 2021
Our approach to building transparent and explainable AI systems LinkedIn 2021
5 Steps for Building Machine Learning Models for Business Shopify 2021
Data Is An Art, Not Just A Science—And Storytelling Is The Key Shopify 2022
Best Practices for Real-time Machine Learning: Alerting Nubank 2022
Automatic Retraining for Machine Learning Models: Tips and Lessons Learned Nubank 2022
ML Education at Uber: Frameworks Inspired by Engineering Principles Uber 2022

Team Structure

What is the most effective way to structure a data science team? Udemy 2017
Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science Department Stitch Fix 2016
Building The Analytics Team At Wish Wish 2018
Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist Stitch Fix 2019
Cultivating Algorithms: How We Grow Data Science at Stitch Fix Stitch Fix
Analytics at Netflix: Who We Are and What We Do Netflix 2020
Building a Data Team at a Mid-stage Startup: A Short Story Erikbern 2021
A Behind-the-Scenes Look at How Postman’s Data Team Works Postman 2021
Data Scientist x Machine Learning Engineer Roles: How are they different? How are they alike? Nubank 2022

Fails

When It Comes to Gorillas, Google Photos Remains Blind Google 2018
160k+ High School Students Will Graduate Only If a Model Allows Them to International Baccalaureate 2020
An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a Furor Harrisburg University 2020
It's Hard to Generate Neural Text From GPT-3 About Muslims OpenAI 2020
A British AI Tool to Predict Violent Crime Is Too Flawed to Use United Kingdom 2020
✅ More in awful-ai
AI Incident Database Partnership on AI 2022

About

List of applied ML papers and blog posts merged from {Eugene Yan, Evidently AI, and personal search} with simple progress tracking and paper recommendation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages