Skip to content

AgentScaleLab/AI-Engineering-101

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

AI Engineering 101

All you need to know to get started with AI Engineering. In this repository, I'm documenting my learning notes of AI Engineering.

Table of Contents

Vibe Coding

Research

Research advances on agent infra and multimodal infra

Agent Infra

Agent infra focuses on optimizing agent runtime performance instead of building agents. For more information, check out Why agent infrastructure matters and Agent Engineering: A New Discipline.

  1. Identifying the Risks of LM Agents with an LM-Emulated Sandbox. ICLR 2024.
  2. Autellix: An Efficient Serving Engine for LLM Agents as General Programs. arXiv 2025.

Multimodal Infra

If you want to learn more about multimodal models, check out the Understanding Multimodal LLMs.

Training

  1. DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models. arXiv 2024.
  2. DISTMM: Accelerating Distributed Multimodal Model Training. NSDI 2024.
  3. PipeWeaver: Addressing Data Dynamicity in Large Multimodal Model Training with Dynamic Interleaved Pipeline. arXiv 2025.

Serving

  1. Approximate Caching for Efficiently Serving Text-to-Image Diffusion Models. NSDI 2024.
  2. Katz: Efficient Workflow Serving for Diffusion Models with Many Adapters. ATC 2025.
  3. Understanding Diffusion Model Serving in Production: A Top-Down Analysis of Workload, Scheduling, and Resource Efficiency. SoCC 2025.

GPU Kernels

  1. Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency. arXiv 2025.
    • FlashAttention-2 JVP kernel for training
  2. SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration. ICLR 2025.
    • 8-bit attention kernel for inference
  3. SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention. arXiv 2025.
    • Sparse attention kernel for inference

Engineering

Engineering best practices for building AI systems

Kernel Best Practices

  1. OpenAI Triton Best Practices
    • A hands-on tutorial on best practices for writing efficient GPU kernels using Triton.

PyTorch Best Practices

  1. Optimize Training Performance in PyTorch
    • Model FLOPs Utilization (MFU), PyTorch Profiler & Nsight Systems for performance monitoring, Triton and Nsight Compute for kernel optimization.

Multimodal Best Practices

  1. Disaggregated Hybrid Parallelism with Ray
    • A framework for training vision-language models using disaggregated hybrid parallelism, where each model component adopts its optimal parallelization strategy independently.

Ray Best Practices

  1. Use Ray for Distributed Machine Learning Apps
    • Scaling up machine learning applications using Ray.

About

AI Engineering Notebook

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages