Course

Serverless Data Processing with Dataflow: Foundations

Google Cloud

This course is the first part of a 3-course series on Serverless Data Processing with Dataflow, offered by Google Cloud. It provides a solid foundation in Apache Beam, Dataflow, and the Beam Portability framework. Through a series of modules, learners will gain a thorough understanding of how to separate compute and storage with Dataflow, implement the right security model for their use case, and utilize identity, access, and management tools effectively.

  • Refresh on Apache Beam and its relationship with Dataflow
  • Explore the benefits of the Beam Portability framework
  • Learn how Dataflow allows separation of compute and storage while saving costs
  • Understand how identity, access, and management tools interact with Dataflow pipelines
  • Implement the right security model for specific use cases on Dataflow

The course is designed for individuals familiar with the Data Engineering specialization, and it is recommended to have prior knowledge from prerequisite courses covering core Dataflow principles and streaming basics concepts.

Certificate Available ✔

Get Started / More Info
Serverless Data Processing with Dataflow: Foundations
Course Modules

The course starts with a brief introduction and refresher on Apache Beam and Dataflow, followed by in-depth modules on Beam Portability, separating compute and storage with Dataflow, IAM, quotas, permissions, security, and concludes with a summary and additional resources.

Introduction

The Introduction module provides an overview of the course, including a refresher on Apache Beam, its relationship with Dataflow, and instructions on how to send feedback.

Beam Portability

The Beam Portability module delves into the benefits of the Beam Portability framework, exploring topics such as Runner v2, container environments, and cross-language transforms.

Separating Compute and Storage with Dataflow

The Separating Compute and Storage with Dataflow module covers essential aspects of Dataflow, including the Dataflow Shuffle Service, Dataflow Streaming Engine, and flexible resource scheduling.

IAM, Quotas, and Permissions

The IAM, Quotas, and Permissions module provides insights into IAM, quotas, and permissions, enabling learners to effectively manage access and permissions for their Dataflow pipelines.

Security

The Security module focuses on data locality, shared VPC, private IPs, and Customer-Managed Encryption Keys (CMEK). The lab at the end allows learners to set up IAM and networking for their Dataflow jobs.

Summary

The Summary module offers a concise recap of the course content, followed by additional resources for further learning and exploration.

More Data Analysis Courses

Computational Social Science

University of California, Davis

Learn how digital technology revolutionizes society and its study in the Computational Social Science course.

Use Tableau for Your Data Science Workflow

University of California, Irvine

This specialization explores visualization in the data science workflow, using Tableau for interactive visual analytics, data manipulation, and dashboard creation....

Data Scientist Career Guide and Interview Preparation

IBM

Prepare for a successful career in data science with this comprehensive course, covering job-seeking materials, interview preparation, and essential skills for aspiring...

Introduction to Python Fundamentals

University of Colorado Boulder

Introduction to Python Fundamentals is a beginner-friendly course designed to teach the basics of programming in Python. Through a slow-paced learning approach,...