Hands-On Generative AI with Transformers and Diffusion Models

by Omar Sanseviero

Artificial Intelligence

Book Details

Book Title

Hands-On Generative AI with Transformers and Diffusion Models

Author

Omar Sanseviero

Publisher

O'Reilly Media

Publication Date

2024

ISBN

9781098149246

Number of Pages

576

Language

English

Format

PDF

File Size

7MB

Subject

Artificial Intelligence

Table of Contents

  • Preface
  • I. Leveraging Open Models
  • 1. An Introduction to Generative Media
  • Generating Images
  • Generating Text
  • Generating Sound Clips
  • Ethical and Societal Implications
  • Where We’ve Been and Where Things Stand
  • How Are Generative AI Models Created?
  • Summary
  • 2. Transformers
  • A Language Model in Action
  • A Transformer Block
  • Transformer Model Genealogy
  • The Power of Pretraining
  • Transformers Recap
  • Project Time: Using LMs to Generate Text
  • Summary
  • Exercises
  • Challenges
  • References
  • 3. Compressing and Representing Information
  • AutoEncoders
  • Variational AutoEncoders
  • CLIP
  • Alternatives to CLIP
  • Project Time: Semantic Image Search
  • Summary
  • Exercises
  • Challenges
  • References
  • 4. Diffusion Models
  • The Key Insight: Iterative Refinement
  • Training a Diffusion Model
  • In Depth: Noise Schedules
  • In Depth: UNets and Alternatives
  • In Depth: Diffusion Objectives
  • Project Time: Train Your Diffusion Model
  • Summary
  • Exercises
  • Challenges
  • References
  • 5. Stable Diffusion and Conditional Generation
  • Adding Control: Conditional Diffusion Models
  • Improving Efficiency: Latent Diffusion
  • Stable Diffusion: Components in Depth
  • Putting It All Together: Annotated Sampling Loop
  • Open Data, Open Models
  • Project Time: Build an Interactive ML Demo with Gradio
  • Summary
  • Exercises
  • Challenge
  • References
  • II. Transfer Learning for Generative Models
  • 6. Fine-Tuning Language Models
  • Classifying Text
  • Generating Text
  • Instructions
  • A Quick Introduction to Adapters
  • A Light Introduction to Quantization
  • Putting It All Together
  • A Deeper Dive into Evaluation
  • Project Time: Retrieval-Augmented Generation
  • Summary
  • Exercises
  • Challenge
  • References
  • 7. Fine-Tuning Stable Diffusion
  • Full Stable Diffusion Fine-Tuning
  • DreamBooth
  • Training LoRAs
  • Giving Stable Diffusion New Capabilities
  • Project Time: Train an SDXL DreamBooth LoRA by Yourself
  • Summary
  • Exercises
  • Challenge
  • References
  • III. Going Further
  • 8. Creative Applications of Text-to-Image Models
  • Image to Image
  • Inpainting
  • Prompt Weighting and Image Editing
  • Real Image Editing via Inversion
  • ControlNet
  • Image Prompting and Image Variations
  • Project Time: Your Creative Canvas
  • Summary
  • Exercises
  • References
  • 9. Generating Audio
  • Audio Data
  • Speech to Text with Transformer-Based Architectures
  • From Text to Speech to Generative Audio
  • Evaluating Audio-Generation Systems
  • What’s Next?
  • Project Time: End-to-End Conversational System
  • Summary
  • Exercises
  • Challenges
  • References
  • 10. Rapidly Advancing Areas in Generative AI
  • Preference Optimization
  • Long Contexts
  • Mixture of Experts
  • Optimizations and Quantizations
  • Data
  • One Model to Rule Them All
  • Computer Vision
  • 3D Computer Vision
  • Video Generation
  • Multimodality
  • Community
  • A. Open Source Tools
  • B. LLM Memory Requirements
  • C. End-to-End Retrieval-Augmented Generation
  • Index
  • About the Authors