Databricks Data Intelligence Platform

by Nikhil Gupta

Artificial Intelligence

Book Details

Book Title

Databricks Data Intelligence Platform

Author

Nikhil Gupta, Jason Yip

Publisher

Apress

Publication Date

2024

ISBN

9798868804441

Number of Pages

481

Language

English

Format

PDF

File Size

5.5MB

Subject

Artificial Intelligence / Generative AI

Table of Contents

  • About the Authors
  • About the Technical Reviewers
  • Chapter 1: Databricks Platform: From Lakehouse to Data Intelligence Platform
  • Data Platforms: Historical Perspective
  • Emergence of the Lakehouse
  • What Is a Lakehouse?
  • What Is the Databricks Lakehouse?
  • Key Features of the Databricks Lakehouse Platform
  • Introducing the Databricks Data Intelligence Platform
  • Conclusion
  • Chapter 2: Databricks Platform Overview
  • Key Terminology
  • Databricks Compute or Clusters
  • Databricks All-Purpose Cluster Setup
  • Cluster Sizing Considerations and Best Practices
  • Databricks Notebooks
  • Library Management
  • External Databricks Connectivity
  • Conclusion
  • Chapter 3: Data Ingestion in Lakehouse
  • Introduction
  • Cloud Ingestion
  • Delta Ingestion
  • Conclusion
  • Chapter 4: Delta Lake - Deep Dive
  • The Challenges of Other Formats
  • What Is Delta Lake?
  • Delta Lake: Medallion Architecture
  • Delta Lake Key Features
  • Time Travel
  • Clone Delta Tables
  • Generated Column
  • Change Data Feed
  • Universal Format
  • Delta Optimization
  • Liquid Clustering
  • Working with Liquid Clustering
  • Current Limitations
  • Predictive I/O
  • Conclusion
  • Chapter 5: Data Governance with Unity Catalog
  • What Is Databricks Unity Catalog?
  • Unity Catalog: Before and After
  • Unity Catalog Hierarchy
  • Unity Catalog Admin Roles
  • Organizing Data in Unity Catalog
  • Key Features of Unity Catalog
  • Data Lineage
  • Data Access Auditing
  • Data Search and Discovery
  • Row-Level Security and Column-Level Masking
  • Delta Sharing
  • Conclusion
  • Chapter 6: Data Engineering Part 1: Orchestrating Data Pipelines Using Databricks Workflows
  • Databricks Workflow Jobs
  • Databricks Jobs and Tasks
  • Advanced Workflow Features
  • Monitoring Data Pipelines
  • Conclusion
  • Chapter 7: Data Engineering Part 2: Delta Live Tables
  • What Is Delta Live Tables?
  • Creating a DLT Pipeline
  • Logging and Monitoring
  • Enhanced Autoscaling
  • Runtime Channels
  • Example: A Retail Sales Pipeline
  • Conclusion
  • Chapter 8: Data Warehousing with DBSQL
  • What Is Databricks SQL?
  • SQL Warehouses
  • Constraints in DBSQL
  • Streaming Tables and Materialized Views
  • Materialized Views
  • Connect Power BI Desktop to Databricks
  • Conclusion
  • Chapter 9: Machine Learning Operations Using Databricks
  • Machine Learning with Databricks
  • Machine Learning Lifecycle: MLOps
  • Chapter 10: Generative AI with Databricks
  • What Is Generative AI?
  • Databricks Generative AI
  • The GenAI Journey
  • Prompt Engineering
  • Retrieval Augmented Generation
  • Mosaic AI Fine-Tuning API
  • Pre-Training
  • Gen AI Pricing
  • Conclusion
  • Chapter 11: Large Language Model Operations
  • Machine Learning Operations
  • Large Language Model Operations
  • Components of LLMOps
  • Deep Dive into Each Process
  • A Case Study of AI2’s OLMo
  • Conclusion
  • Chapter 12: Mosaic AI Agent Framework: Creating Quality AI Agents
  • Part 0: The Installations
  • Part 1: LangChain Parametrization
  • Part 2: MLflow Evaluation
  • Part 3: Model Development
  • Part 4: Deployment
  • Evaluation Example
  • Conclusion
  • Chapter 13: DBRX: Creating an LLM from Scratch Using Databricks
  • What Is DBRX?
  • The DBRX Benchmarks
  • DBRX Architecture
  • The MosaicML Stack
  • Distributed GPU Training
  • Model Serving
  • Using DBRX on Databricks
  • Conclusion
  • Chapter 14: The Databricks Data Intelligence Platform
  • Databricks IQ
  • Deep Dive into Databricks IQ
  • Chapter 15: Databricks CI/CD
  • What Is CI/CD?
  • Stages of CI/CD
  • Introduction to Databricks Repos
  • Databricks UI vs. Git Terminologies
  • Databricks Asset Bundles
  • Case Study: Databricks MLOps Stack
  • Conclusion
  • Chapter 16: Databricks Pricing and Observability Using System Tables
  • Costs Associated with the Databricks Platform
  • Cloud Infrastructure Costs
  • Databricks Pricing
  • Databricks Cost Management Best Practices
  • Databricks Observability: System Tables
  • Conclusion
  • Chapter 17: Databricks Platform Security and Compliance
  • Databricks Architecture
  • Azure Databricks Deployment
  • Identity and Access
  • Security Analysis Tool
  • Databricks Security Best Practices
  • Conclusion
  • Chapter 18: Spark Structured Streaming: A Comprehensive Guide
  • Spark Streaming
  • Structured Streaming
  • What Is Continuous Processing?
  • Triggers
  • Output Modes
  • Windowed Grouped Aggregation
  • State Management
  • Late-Arrival Handling: Watermark
  • Auto Loader
  • Project Lightspeed
  • Structured Streaming Best Practices
  • Conclusion
  • Chapter 19: From Ideation to Creation: A Walk-Through of Building a GenAI Application
  • The Problem Statement
  • Data Generation: Source
  • Data Ingestion: Ingest
  • Data Transformation: Transform
  • Machine Learning Model for Diabetes Complication Classification: Query and Process
  • Generative AI: Serve
  • Monitoring Dashboard: Analysis
  • Conclusion
  • Index