Scalable Kubernetes Infrastructure for AI Platforms

by Alex Corvin, Taneem Ibrahim, and Kyle Stratis

Cloud Computing

Book Details

Book Title

Scalable Kubernetes Infrastructure for AI Platforms

Author

Alex Corvin, Taneem Ibrahim, and Kyle Stratis

Publisher

O'Reilly Media, Inc

Publication Date

2025

ISBN

9798341608184

Number of Pages

82

Language

English

Format

PDF

File Size

3.25MB

Subject

kubernetes/ai-infrastructure

Table of Contents

  • 1. Introduction
  • What Is MLOps?
  • Why Use Kubernetes for Your MLOps Platform?
  • 2. Model Development on Kubernetes
  • Overview of LLM Customization Techniques
  • Kubernetes-Native Model Training Tools
  • Managing Compute Resources for Training
  • 3. Making Training Repeatable
  • Retraining and the Model Development Lifecycle
  • Tracking Model Versions
  • Automating Model Training
  • GitOps for Model Training Pipelines
  • 4. Model Deployment and Monitoring
  • Overview of LLM Serving
  • Using a Model-Serving Platform
  • Diving Into LLM-Serving Runtimes with vLLM
  • Monitoring and Keeping Track of Your Models
  • 5. Responsible AI
  • Data Safety and Transparency
  • AI Guardrails
  • 6. Summary and Outlook
  • Personalized Healthcare Chatbot
  • Future Technology Outlook
  • About the Authors