Skip to content

πŸ”… Courses and TutorialsΒΆ

NoSQL, Big Data, and Spark Foundations SpecializationΒΆ

  • Introduction to NoSQL Databases
  • Introduction to Big Data with Spark and Hadoop
  • Week 1: What is Big Data?
    • Reading: Course Introduction
    • Video: What is Big Data?
    • Video: Impact of Big Data
    • Video: Parallel Processing, Scaling, and Data Parallelism
    • Video: Big Data Tools and Ecosystem
    • Video: Open Source and Big Data
    • Video: Beyond the Hype
    • Video: Big Data Use Cases
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Introduction to Big Data
  • Week 2: Introduction to the Hadoop Ecosystem
    • Video: Introduction to Hadoop
    • Video: Intro to MapReduce
    • Video: Hadoop Ecosystem
    • Video: HDFS
    • Video: HIVE
    • Video: HBASE
    • LTI Item: Hands-on Lab: Hadoop MapReduce
    • LTI Item: Hands-on lab : Hadoop Cluster (Optional)
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Introduction to Hadoop
  • Week 3: Apache Spark
    • Video: Why use Apache Spark?
    • Video: Functional Programming Basics
    • Video: Parallel Programming using Resilient Distributed Datasets
    • Video: Scale out / Data Parallelism in Apache Spark
    • Video: Dataframes and SparkSQL
    • LTI Item: Hands-on Lab: Getting Started with Spark using Python
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Introduction to Apache Spark
  • Week 4: DataFrames and SparkSQL
    • Video: RDDs in Parallel Programming and Spark
    • Video: Data-frames and Datasets
    • Video: Catalyst and Tungsten
    • Video: ETL with DataFrames
    • LTI Item: Hands-on Lab: Introduction to Data-Frames
    • Video: Real-world usage of SparkSQL
    • LTI Item: Hands-On Lab: Introduction to SparkSQL
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Introduction to Data-Frames & SparkSQL
  • Week 5: Development and Runtime Environment Options
    • Video: Apache Spark Architecture
    • Video: Overview of Apache Spark Cluster Modes
    • Video: How to Run an Apache Spark Application
    • LTI Item: Hands-on Lab: Submit Apache Spark Applications
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Spark Architecture
    • Video: Using Apache Spark on IBM Cloud
    • LTI Item: Activate Trial Account
    • Ungraded Plugin: Hands-on Lab: Getting started with Spark on IBM Cloud
    • Video: Setting Apache Spark Configuration
    • Video: Running Spark on Kubernetes
    • LTI Item: Hands-on Lab: Apache Spark on Kubernetes
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Spark Runtime Environments
  • Week 6: Monitoring & Tuning
    • Video: The Apache Spark User Interface
    • Video: Monitoring Application Progress
    • Video: Debugging Apache Spark Application Issues
    • Video: Understanding Memory Resources
    • Video: Understanding Processor Resources
    • LTI Item: Hands-on Lab: Monitoring and Performance Tuning
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Introduction to Monitoring & Tuning
    • Reading: Instructions for the Final Exam
    • Reading: Congrats & Next Steps
    • Reading: Team & Acknowledgements
  • Data Engineering and Machine Learning using Spark
  • Week 1: Spark for Data Engineering
    • Reading: Course Introduction
    • Video: Spark Structured Streaming
    • Video: GraphFrames on Apache Spark
    • Video: ETL Workloads
    • Video: Introduction to the pipeline editor in Elyra (Optional) Ungraded Plugin: Reading: Create component oriented data science pipelines using CLAIMED, - Elyra, KubeFlow Pipelines, MLX and Kubernetes
    • LTI Item: Hands-on Lab: ETL using Apache Spark
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: Spark for Data Engineering
  • Week 2: SparkML
    • Video: SparkML Fundamentals
    • Video: Classification and Regression using Apache Spark
    • Video: SparkML Clustering
    • LTI Item: Obtain an IBM Cloud Feature Code
    • Ungraded Plugin: Jupyter Notebook for Hands-on Lab: Machine Learning with Apache Spark ML
    • LTI Item: Optional: Hands on Lab: Introduction to SparkML
    • Reading: Summary & Highlights
    • Practice Quiz: Practice Quiz: SparkML
  • Week 3: Final Project
    • Reading: Project Overview
    • LTI Item: Hands-on Lab: ETL and Machine Learning
    • Reading: Congratulations & Next Steps
    • Reading: Team & Acknowledgements

Data Engineering with MS Azure Synapse Apache Spark PoolsΒΆ

  • Week 1: Big Data Engineering
  • Video: Introduction to the course
  • Reading: Course syllabus
  • Reading: How to be successful in this course
  • Discussion Prompt: Meet and greet
  • Video: What is an Apache Spark pool in Azure Synapse Analytics?
  • Video: How do Apache Spark pools in Azure Synapse Analytics?
  • Reading: When do you use Apache Spark pools in Azure Synapse Analytics?
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Video: Introduction to spark notebooks
  • Video: Understand the use-cases for spark notebooks
  • Reading: Create a spark notebook in Azure Synapse Analytics
  • Reading: Discover supported languages in spark notebooks
  • Reading: Develop spark notebooks
  • Reading: Develop spark notebooks
  • Video: Run spark notebooks
  • Reading: Run spark notebooks
  • Reading: Load data in Spark notebooks
  • Reading: Load data in Spark notebooks
  • Reading: Save Spark notebooks
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Video: Introduction to DataFrames in Spark pools in Azure Synapse Analytics
  • Video: Load data in a Spark DataFrame
  • Reading: Load data into a Spark DataFrame
  • Reading: Create an Apache Spark table
  • Video: Flatten nested structures and explode arrays with Apache Spark
  • Reading: Flatten nested structures and explode arrays with Apache Spark in synapse
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Week 2: Query pools and manage workloads in Azure Synapse Analytics
  • Video: Describe the integration methods between SQL and Spark pools in Azure Synapse Analytics
  • Video: Understand the use-cases for SQL and Spark pools integration
  • Video: Authenticate in Azure Synapse Analytics
  • Reading: Transfer data between SQL and Spark pool in Azure Synapse Analytics
  • Reading: Authenticate between Spark and SQL pool in Azure Synapse Analytics
  • Reading: Integrate SQL and Spark pools in Azure Synapse Analytics
  • Video: Externalize the use of Spark pools within Azure Synapse Workspace
  • Reading: Transfer data outside the Synapse workspace using the PySpark connector
  • Video: Transfer data outside the Synapse workspace using the PySpark connector
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Video: Monitor Spark pools in Azure Synapse Analytics Reading: Base-line Apache Spark performance with Apache Spark history server in Azure Synaps- Analytics
  • Video: Optimize Apache Spark jobs in Azure Synapse Analytics
  • Reading: Automate scaling of Apache Spark pools in Azure Synapse Analytics
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Week 3: Practice Exam on Perform data engineering with Azure Synapse Apache Spark Pools
  • Reading: About the practice exam
  • Video: Course 6 recap
  • Video: Course wrap up
  • Discussion Prompt: Reflect on learning
  • Reading: Next steps

Data Science with Databricks for Data Analysts SpecializationΒΆ

  • Apache Spark (TM) SQL for Data Analysts
  • Week 1: Welcome to Apache Spark SQL for Data Analysts
    • 1 video
    • Course goals1m
    • 1 reading
    • Before you begin5m
  • Week 2: Spark makes big data easy
    • 6 videos
    • Introduction to module 21m
    • What is big data?6m
    • Common struggles with big data4m
    • Big Data Needs2m
    • Apache Spark Intro3m
    • Spark SQL2m
    • 1 practice exercise
    • Module 2 Concept Review30m
  • Week 3: Using Spark SQL on Databricks
    • 9 videos
    • Introduction to Module 31m
    • Signing up for Databricks Community Edition1m
    • Preparing your workspace2m
    • Working with notebooks3m
    • Using course materials6m
    • Basic queries with Spark SQL reading introduction1m
    • Data Visualization on Databricks reading introduction38s
    • Data visualization tools1m
    • Exploratory Data Analysis lab introduction25s
    • 4 readings
    • Course Materials5m
    • Basic Queries reading activity30m
    • Data Visualization reading activity30m
    • Your turn! Exploratory Data Analysis lab30m
  • Week 4: Spark Under the Hood
    • 7 videos
    • Introduction to module 454s
    • Understanding optimizations7m
    • The physical cluster3m
    • The SparkUI and SQL tab2m
    • Optimizing query logic4m
    • Impact of Caching6m
    • Optimizing with selective data loading6m
    • 1 practice exercise
    • Module 4 Concept Review30m
  • Week 5: Complex Queries
    • 5 videos
    • Introduction to module 559s
    • What is nested data? 2m
    • Introduction to managing nested data1m
    • Introduction to Manipulating Data 21s
    • Introduction to Data Munging57s
    • 3 readings
    • Managing Nested Data reading activity30m
    • Manipulating Data reading activity30m
    • 5.3 Data Munging Lab30m
  • Week 6: Applied Spark SQL
    • 7 videos
    • Introduction to module 61m
    • Complex data - common strategies4m
    • About higher-order functions3m
    • Higher-order functions introduction19s
    • Introducing Aggregating and Summarizing Data25s
    • Partitioning Tables Introduction34s
    • Sharing Insights Lab Introduction21s
    • 4 readings
    • Higher Order Functions reading activity30m
    • Aggregating and Summarizing Data reading activity30m
    • Partitioning Tables10m
    • Sharing Insights10m
  • Week 7: Data Storage and Optimization
    • 5 videos
    • Introduction to module 71m
    • A quick refresher1m
    • Introducing a new data management paradigm39s
    • Introduction to the lesson48s
    • What is Delta Lake5m
    • 4 readings
    • Data Warehouses10m
    • Data Lakes10m
    • Data Lakes vs Data Warehouses10m
    • The Lakehouse15m
  • Week 8: Delta Lake with Spark SQL
    • 5 videos
    • Introduction to the module1m
    • Intro to Using Delta reading49s
    • Managing Records in a Delta table34s
    • Delta Engine Optimization Introduction40s
    • Delta Lake Lab Introduction17s
    • 4 readings
    • 8.1 Using Delta10m
    • 8.2 Managing records10m
    • 8.3 Optimizing Delta10m
    • Delta Lab45m
  • Week 9: SQL Coding Challenges
    • 1 reading
    • SQL coding challenges1h
    • 1 practice exercise
    • Final Exam30m
  • Data Science Fundamentals for Data Analysts
  • Applied Data Science for Data Analysts

Data Engineering with MS Azure Synapse Apache Spark PoolsΒΆ

  • Week 1: Big Data Engineering
  • Video: Introduction to the course
  • Reading: Course syllabus
  • Reading: How to be successful in this course
  • Discussion Prompt: Meet and greet
  • Video: What is an Apache Spark pool in Azure Synapse Analytics?
  • Video: How do Apache Spark pools in Azure Synapse Analytics?
  • Reading: When do you use Apache Spark pools in Azure Synapse Analytics?
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Video: Introduction to spark notebooks
  • Video: Understand the use-cases for spark notebooks
  • Reading: Create a spark notebook in Azure Synapse Analytics
  • Reading: Discover supported languages in spark notebooks
  • Reading: Develop spark notebooks
  • Reading: Develop spark notebooks
  • Video: Run spark notebooks
  • Reading: Run spark notebooks
  • Reading: Load data in Spark notebooks
  • Reading: Load data in Spark notebooks
  • Reading: Save Spark notebooks
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Video: Introduction to DataFrames in Spark pools in Azure Synapse Analytics
  • Video: Load data in a Spark DataFrame
  • Reading: Load data into a Spark DataFrame
  • Reading: Create an Apache Spark table
  • Video: Flatten nested structures and explode arrays with Apache Spark
  • Reading: Flatten nested structures and explode arrays with Apache Spark in synapse
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Week 2: Query pools and manage workloads in Azure Synapse Analytics
  • Video: Describe the integration methods between SQL and Spark pools in Azure Synapse Analytics
  • Video: Understand the use-cases for SQL and Spark pools integration
  • Video: Authenticate in Azure Synapse Analytics
  • Reading: Transfer data between SQL and Spark pool in Azure Synapse Analytics
  • Reading: Authenticate between Spark and SQL pool in Azure Synapse Analytics
  • Reading: Integrate SQL and Spark pools in Azure Synapse Analytics
  • Video: Externalize the use of Spark pools within Azure Synapse Workspace
  • Reading: Transfer data outside the Synapse workspace using the PySpark connector
  • Video: Transfer data outside the Synapse workspace using the PySpark connector
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Video: Monitor Spark pools in Azure Synapse Analytics
  • Reading: Base-line Apache Spark performance with Apache Spark history server in Azure Synapse - Analytics
  • Video: Optimize Apache Spark jobs in Azure Synapse Analytics
  • Reading: Automate scaling of Apache Spark pools in Azure Synapse Analytics
  • Practice Quiz: Knowledge check
  • Video: Lesson summary
  • Week 3: Practice Exam on Perform data engineering with Azure Synapse Apache Spark Pools
  • Reading: About the practice exam
  • Video: Course 6 recap
  • Video: Course wrap up
  • Discussion Prompt: Reflect on learning
  • Reading: Next steps

Distributed Computing with Spark SQLΒΆ

  • Week 1: Introduction to Spark
  • Discussion Prompt: Learning Goals
  • Reading: A Note From UC Davis
  • Video: Course Introduction
  • Video: Why Distributed Computing?
  • Video: Spark DataFrames
  • Video: The Databricks Environment
  • Video: SQL in Notebooks
  • Video: Import Data
  • Reading: Readings and Resources
  • Reading: Assignment #1 - Queries in Spark SQL
  • Week 2: Spark Core Concepts
  • Video: Module Introduction
  • Video: Spark Terminology
  • Video: Caching
  • Video: Shuffle Partitions
  • Video: Spark UI
  • Video: Adaptive Query Execution (AQE)
  • Reading: Readings
  • Reading: Assignment #2 - Spark Internals
  • WeeK 3: Engineering Data Pipelines
  • Video: Module Introduction
  • Video: Spark as a Connector
  • Video: Accessing Data
  • Video: File Formats
  • Video: JSON, Schemas and Types
  • Video: Writing Data
  • Video: Tables and Views
  • Reading: Readings
  • Reading: Assignment #3 - Engineering Data Pipelines
  • Week 4: Data Lakes, Warehouses and Lakehouses
  • Video: Module Introduction
  • Video: Data Lakes vs. Data Warehouses
  • Video: What is a Lakehouse?
  • Video: Delta Lake
  • Video: Delta Lake (Demo)
  • Video: Delta Advanced Features (Demo)
  • Video: Continuing with Spark and Data Science
  • Reading: Readings
  • Reading: Assignment #4 - Lakehouse
  • Video: Course Summary
  • Discussion Prompt: Self-Reflection

Data Warehousing with Microsoft Azure Synapse AnalyticsΒΆ

TutorialsΒΆ

NoteΒΆ

To edit multiple lines

  • Shift+Option+i (Add cursor to end of each lines)
  • Shift+Home (Moves cursor to start of each lines)