Fundamentals of Data Engineering

by Joe Reis; Matt Housley

Data Science

Book Details

Book Title

Fundamentals of Data Engineering

Author

Joe Reis; Matt Housley

Publisher

O'Reilly Media, Inc

Publication Date

2022

ISBN

9781098108304

Number of Pages

636

Language

English

Format

PDF

File Size

8.9MB

Subject

Data science

Table of Contents

  • Preface
  • I. Foundation and Building Blocks
  • Chapter 1: Data Engineering Described
  • What Is Data Engineering?
  • Data Engineering Skills and Activities
  • Data Engineers Inside an Organization
  • Conclusion
  • Additional Resources
  • Chapter 2: The Data Engineering Lifecycle
  • What Is the Data Engineering Lifecycle?
  • Major Undercurrents Across the Data Engineering Lifecycle
  • Conclusion
  • Additional Resources
  • Chapter 3: Designing Good Data Architecture
  • What Is Data Architecture?
  • Principles of Good Data Architecture
  • Major Architecture Concepts
  • Examples and Types of Data Architecture
  • Who’s Involved with Designing a Data Architecture?
  • Conclusion
  • Additional Resources
  • Chapter 4: Choosing Technologies Across the Data Engineering Lifecycle
  • Team Size and Capabilities
  • Speed to Market
  • Interoperability
  • Cost Optimization and Business Value
  • Today Versus the Future: Immutable Versus Transitory Technologies
  • Location
  • Build Versus Buy
  • Monolith Versus Modular
  • Serverless Versus Servers
  • Optimization, Performance, and the Benchmark Wars
  • Undercurrents and Their Impacts on Choosing Technologies
  • Conclusion
  • Additional Resources
  • II. The Data Engineering Lifecycle in Depth
  • Chapter 5: Data Generation in Source Systems
  • Sources of Data: How Is Data Created?
  • Source Systems: Main Ideas
  • Source System Practical Details
  • Whom You’ll Work With
  • Undercurrents and Their Impact on Source Systems
  • Conclusion
  • Additional Resources
  • Chapter 6: Storage
  • Raw Ingredients of Data Storage
  • Data Storage Systems
  • Data Engineering Storage Abstractions
  • Big Ideas and Trends in Storage
  • Whom You’ll Work With
  • Undercurrents
  • Conclusion
  • Additional Resources
  • Chapter 7: Ingestion
  • What Is Data Ingestion?
  • Key Engineering Considerations for the Ingestion Phase
  • Batch Ingestion Considerations
  • Message and Stream Ingestion Considerations
  • Ways to Ingest Data
  • Whom You’ll Work With
  • Undercurrents
  • Conclusion
  • Additional Resources
  • Chapter 8: Queries, Modeling, and Transformation
  • Queries
  • Data Modeling
  • Transformations
  • Whom You’ll Work With
  • Undercurrents
  • Conclusion
  • Additional Resources
  • Chapter 9: Serving Data for Analytics, Machine Learning, and Reverse ETL
  • General Considerations for Serving Data
  • Analytics
  • Machine Learning
  • What a Data Engineer Should Know About ML
  • Ways to Serve Data for Analytics and ML
  • Reverse ETL
  • Whom You’ll Work With
  • Undercurrents
  • Conclusion
  • Additional Resources
  • III. Security, Privacy, and the Future of Data Engineering
  • Chapter 10: Security and Privacy
  • People
  • Processes
  • Technology
  • Conclusion
  • Additional Resources
  • Chapter 11: The Future of Data Engineering
  • The Data Engineering Lifecycle Isn’t Going Away
  • The Decline of Complexity and the Rise of Easy-to-Use Data Tools
  • The Cloud-Scale Data OS and Improved Interoperability
  • “Enterprisey” Data Engineering
  • Titles and Responsibilities Will Morph...
  • Moving Beyond the Modern Data Stack, Toward the Live Data Stack
  • Conclusion
  • Appendix A: Serialization and Compression Technical Details
  • Serialization Formats
  • Database Storage Engines
  • Compression: gzip, bzip2, Snappy, Etc.
  • Appendix B: Cloud Networking
  • Cloud Network Topology
  • CDNs
  • The Future of Data Egress Fees
  • Index
  • About the Authors