π Courses and TutorialsΒΆ
NoSQL, Big Data, and Spark Foundations SpecializationΒΆ
- Introduction to NoSQL Databases
- Introduction to Big Data with Spark and Hadoop
- Week 1: What is Big Data?
- Reading: Course Introduction
- Video: What is Big Data?
- Video: Impact of Big Data
- Video: Parallel Processing, Scaling, and Data Parallelism
- Video: Big Data Tools and Ecosystem
- Video: Open Source and Big Data
- Video: Beyond the Hype
- Video: Big Data Use Cases
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Introduction to Big Data
- Week 2: Introduction to the Hadoop Ecosystem
- Video: Introduction to Hadoop
- Video: Intro to MapReduce
- Video: Hadoop Ecosystem
- Video: HDFS
- Video: HIVE
- Video: HBASE
- LTI Item: Hands-on Lab: Hadoop MapReduce
- LTI Item: Hands-on lab : Hadoop Cluster (Optional)
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Introduction to Hadoop
- Week 3: Apache Spark
- Video: Why use Apache Spark?
- Video: Functional Programming Basics
- Video: Parallel Programming using Resilient Distributed Datasets
- Video: Scale out / Data Parallelism in Apache Spark
- Video: Dataframes and SparkSQL
- LTI Item: Hands-on Lab: Getting Started with Spark using Python
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Introduction to Apache Spark
- Week 4: DataFrames and SparkSQL
- Video: RDDs in Parallel Programming and Spark
- Video: Data-frames and Datasets
- Video: Catalyst and Tungsten
- Video: ETL with DataFrames
- LTI Item: Hands-on Lab: Introduction to Data-Frames
- Video: Real-world usage of SparkSQL
- LTI Item: Hands-On Lab: Introduction to SparkSQL
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Introduction to Data-Frames & SparkSQL
- Week 5: Development and Runtime Environment Options
- Video: Apache Spark Architecture
- Video: Overview of Apache Spark Cluster Modes
- Video: How to Run an Apache Spark Application
- LTI Item: Hands-on Lab: Submit Apache Spark Applications
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Spark Architecture
- Video: Using Apache Spark on IBM Cloud
- LTI Item: Activate Trial Account
- Ungraded Plugin: Hands-on Lab: Getting started with Spark on IBM Cloud
- Video: Setting Apache Spark Configuration
- Video: Running Spark on Kubernetes
- LTI Item: Hands-on Lab: Apache Spark on Kubernetes
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Spark Runtime Environments
- Week 6: Monitoring & Tuning
- Video: The Apache Spark User Interface
- Video: Monitoring Application Progress
- Video: Debugging Apache Spark Application Issues
- Video: Understanding Memory Resources
- Video: Understanding Processor Resources
- LTI Item: Hands-on Lab: Monitoring and Performance Tuning
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Introduction to Monitoring & Tuning
- Reading: Instructions for the Final Exam
- Reading: Congrats & Next Steps
- Reading: Team & Acknowledgements
- Data Engineering and Machine Learning using Spark
- Week 1: Spark for Data Engineering
- Reading: Course Introduction
- Video: Spark Structured Streaming
- Video: GraphFrames on Apache Spark
- Video: ETL Workloads
- Video: Introduction to the pipeline editor in Elyra (Optional) Ungraded Plugin: Reading: Create component oriented data science pipelines using CLAIMED, - Elyra, KubeFlow Pipelines, MLX and Kubernetes
- LTI Item: Hands-on Lab: ETL using Apache Spark
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: Spark for Data Engineering
- Week 2: SparkML
- Video: SparkML Fundamentals
- Video: Classification and Regression using Apache Spark
- Video: SparkML Clustering
- LTI Item: Obtain an IBM Cloud Feature Code
- Ungraded Plugin: Jupyter Notebook for Hands-on Lab: Machine Learning with Apache Spark ML
- LTI Item: Optional: Hands on Lab: Introduction to SparkML
- Reading: Summary & Highlights
- Practice Quiz: Practice Quiz: SparkML
- Week 3: Final Project
- Reading: Project Overview
- LTI Item: Hands-on Lab: ETL and Machine Learning
- Reading: Congratulations & Next Steps
- Reading: Team & Acknowledgements
Data Engineering with MS Azure Synapse Apache Spark PoolsΒΆ
- Week 1: Big Data Engineering
- Video: Introduction to the course
- Reading: Course syllabus
- Reading: How to be successful in this course
- Discussion Prompt: Meet and greet
- Video: What is an Apache Spark pool in Azure Synapse Analytics?
- Video: How do Apache Spark pools in Azure Synapse Analytics?
- Reading: When do you use Apache Spark pools in Azure Synapse Analytics?
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Video: Introduction to spark notebooks
- Video: Understand the use-cases for spark notebooks
- Reading: Create a spark notebook in Azure Synapse Analytics
- Reading: Discover supported languages in spark notebooks
- Reading: Develop spark notebooks
- Reading: Develop spark notebooks
- Video: Run spark notebooks
- Reading: Run spark notebooks
- Reading: Load data in Spark notebooks
- Reading: Load data in Spark notebooks
- Reading: Save Spark notebooks
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Video: Introduction to DataFrames in Spark pools in Azure Synapse Analytics
- Video: Load data in a Spark DataFrame
- Reading: Load data into a Spark DataFrame
- Reading: Create an Apache Spark table
- Video: Flatten nested structures and explode arrays with Apache Spark
- Reading: Flatten nested structures and explode arrays with Apache Spark in synapse
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Week 2: Query pools and manage workloads in Azure Synapse Analytics
- Video: Describe the integration methods between SQL and Spark pools in Azure Synapse Analytics
- Video: Understand the use-cases for SQL and Spark pools integration
- Video: Authenticate in Azure Synapse Analytics
- Reading: Transfer data between SQL and Spark pool in Azure Synapse Analytics
- Reading: Authenticate between Spark and SQL pool in Azure Synapse Analytics
- Reading: Integrate SQL and Spark pools in Azure Synapse Analytics
- Video: Externalize the use of Spark pools within Azure Synapse Workspace
- Reading: Transfer data outside the Synapse workspace using the PySpark connector
- Video: Transfer data outside the Synapse workspace using the PySpark connector
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Video: Monitor Spark pools in Azure Synapse Analytics Reading: Base-line Apache Spark performance with Apache Spark history server in Azure Synaps- Analytics
- Video: Optimize Apache Spark jobs in Azure Synapse Analytics
- Reading: Automate scaling of Apache Spark pools in Azure Synapse Analytics
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Week 3: Practice Exam on Perform data engineering with Azure Synapse Apache Spark Pools
- Reading: About the practice exam
- Video: Course 6 recap
- Video: Course wrap up
- Discussion Prompt: Reflect on learning
- Reading: Next steps
Data Science with Databricks for Data Analysts SpecializationΒΆ
- Apache Spark (TM) SQL for Data Analysts
- Week 1: Welcome to Apache Spark SQL for Data Analysts
- 1 video
- Course goals1m
- 1 reading
- Before you begin5m
- Week 2: Spark makes big data easy
- 6 videos
- Introduction to module 21m
- What is big data?6m
- Common struggles with big data4m
- Big Data Needs2m
- Apache Spark Intro3m
- Spark SQL2m
- 1 practice exercise
- Module 2 Concept Review30m
- Week 3: Using Spark SQL on Databricks
- 9 videos
- Introduction to Module 31m
- Signing up for Databricks Community Edition1m
- Preparing your workspace2m
- Working with notebooks3m
- Using course materials6m
- Basic queries with Spark SQL reading introduction1m
- Data Visualization on Databricks reading introduction38s
- Data visualization tools1m
- Exploratory Data Analysis lab introduction25s
- 4 readings
- Course Materials5m
- Basic Queries reading activity30m
- Data Visualization reading activity30m
- Your turn! Exploratory Data Analysis lab30m
- Week 4: Spark Under the Hood
- 7 videos
- Introduction to module 454s
- Understanding optimizations7m
- The physical cluster3m
- The SparkUI and SQL tab2m
- Optimizing query logic4m
- Impact of Caching6m
- Optimizing with selective data loading6m
- 1 practice exercise
- Module 4 Concept Review30m
- Week 5: Complex Queries
- 5 videos
- Introduction to module 559s
- What is nested data? 2m
- Introduction to managing nested data1m
- Introduction to Manipulating Data 21s
- Introduction to Data Munging57s
- 3 readings
- Managing Nested Data reading activity30m
- Manipulating Data reading activity30m
- 5.3 Data Munging Lab30m
- Week 6: Applied Spark SQL
- 7 videos
- Introduction to module 61m
- Complex data - common strategies4m
- About higher-order functions3m
- Higher-order functions introduction19s
- Introducing Aggregating and Summarizing Data25s
- Partitioning Tables Introduction34s
- Sharing Insights Lab Introduction21s
- 4 readings
- Higher Order Functions reading activity30m
- Aggregating and Summarizing Data reading activity30m
- Partitioning Tables10m
- Sharing Insights10m
- Week 7: Data Storage and Optimization
- 5 videos
- Introduction to module 71m
- A quick refresher1m
- Introducing a new data management paradigm39s
- Introduction to the lesson48s
- What is Delta Lake5m
- 4 readings
- Data Warehouses10m
- Data Lakes10m
- Data Lakes vs Data Warehouses10m
- The Lakehouse15m
- Week 8: Delta Lake with Spark SQL
- 5 videos
- Introduction to the module1m
- Intro to Using Delta reading49s
- Managing Records in a Delta table34s
- Delta Engine Optimization Introduction40s
- Delta Lake Lab Introduction17s
- 4 readings
- 8.1 Using Delta10m
- 8.2 Managing records10m
- 8.3 Optimizing Delta10m
- Delta Lab45m
- Week 9: SQL Coding Challenges
- 1 reading
- SQL coding challenges1h
- 1 practice exercise
- Final Exam30m
- Data Science Fundamentals for Data Analysts
- Applied Data Science for Data Analysts
Data Engineering with MS Azure Synapse Apache Spark PoolsΒΆ
- Week 1: Big Data Engineering
- Video: Introduction to the course
- Reading: Course syllabus
- Reading: How to be successful in this course
- Discussion Prompt: Meet and greet
- Video: What is an Apache Spark pool in Azure Synapse Analytics?
- Video: How do Apache Spark pools in Azure Synapse Analytics?
- Reading: When do you use Apache Spark pools in Azure Synapse Analytics?
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Video: Introduction to spark notebooks
- Video: Understand the use-cases for spark notebooks
- Reading: Create a spark notebook in Azure Synapse Analytics
- Reading: Discover supported languages in spark notebooks
- Reading: Develop spark notebooks
- Reading: Develop spark notebooks
- Video: Run spark notebooks
- Reading: Run spark notebooks
- Reading: Load data in Spark notebooks
- Reading: Load data in Spark notebooks
- Reading: Save Spark notebooks
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Video: Introduction to DataFrames in Spark pools in Azure Synapse Analytics
- Video: Load data in a Spark DataFrame
- Reading: Load data into a Spark DataFrame
- Reading: Create an Apache Spark table
- Video: Flatten nested structures and explode arrays with Apache Spark
- Reading: Flatten nested structures and explode arrays with Apache Spark in synapse
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Week 2: Query pools and manage workloads in Azure Synapse Analytics
- Video: Describe the integration methods between SQL and Spark pools in Azure Synapse Analytics
- Video: Understand the use-cases for SQL and Spark pools integration
- Video: Authenticate in Azure Synapse Analytics
- Reading: Transfer data between SQL and Spark pool in Azure Synapse Analytics
- Reading: Authenticate between Spark and SQL pool in Azure Synapse Analytics
- Reading: Integrate SQL and Spark pools in Azure Synapse Analytics
- Video: Externalize the use of Spark pools within Azure Synapse Workspace
- Reading: Transfer data outside the Synapse workspace using the PySpark connector
- Video: Transfer data outside the Synapse workspace using the PySpark connector
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Video: Monitor Spark pools in Azure Synapse Analytics
- Reading: Base-line Apache Spark performance with Apache Spark history server in Azure Synapse - Analytics
- Video: Optimize Apache Spark jobs in Azure Synapse Analytics
- Reading: Automate scaling of Apache Spark pools in Azure Synapse Analytics
- Practice Quiz: Knowledge check
- Video: Lesson summary
- Week 3: Practice Exam on Perform data engineering with Azure Synapse Apache Spark Pools
- Reading: About the practice exam
- Video: Course 6 recap
- Video: Course wrap up
- Discussion Prompt: Reflect on learning
- Reading: Next steps
Distributed Computing with Spark SQLΒΆ
- Week 1: Introduction to Spark
- Discussion Prompt: Learning Goals
- Reading: A Note From UC Davis
- Video: Course Introduction
- Video: Why Distributed Computing?
- Video: Spark DataFrames
- Video: The Databricks Environment
- Video: SQL in Notebooks
- Video: Import Data
- Reading: Readings and Resources
- Reading: Assignment #1 - Queries in Spark SQL
- Week 2: Spark Core Concepts
- Video: Module Introduction
- Video: Spark Terminology
- Video: Caching
- Video: Shuffle Partitions
- Video: Spark UI
- Video: Adaptive Query Execution (AQE)
- Reading: Readings
- Reading: Assignment #2 - Spark Internals
- WeeK 3: Engineering Data Pipelines
- Video: Module Introduction
- Video: Spark as a Connector
- Video: Accessing Data
- Video: File Formats
- Video: JSON, Schemas and Types
- Video: Writing Data
- Video: Tables and Views
- Reading: Readings
- Reading: Assignment #3 - Engineering Data Pipelines
- Week 4: Data Lakes, Warehouses and Lakehouses
- Video: Module Introduction
- Video: Data Lakes vs. Data Warehouses
- Video: What is a Lakehouse?
- Video: Delta Lake
- Video: Delta Lake (Demo)
- Video: Delta Advanced Features (Demo)
- Video: Continuing with Spark and Data Science
- Reading: Readings
- Reading: Assignment #4 - Lakehouse
- Video: Course Summary
- Discussion Prompt: Self-Reflection
Data Warehousing with Microsoft Azure Synapse AnalyticsΒΆ
TutorialsΒΆ
- 3 Ways To Create Tables With Apache Spark and Download Data
- Introduction to Microsoft Spark Utilities - mssparkutils
- Mount in synapse - mssparkutils
- Python by Example
- Spark by Examples
NoteΒΆ
To edit multiple lines
- Shift+Option+i (Add cursor to end of each lines)
- Shift+Home (Moves cursor to start of each lines)