Course

ETL and Data Pipelines with Shell, Airflow and Kafka

IBM

Delve into the world of data processing and pipelines with the "ETL and Data Pipelines with Shell, Airflow and Kafka" course offered by IBM. This comprehensive course explores two distinct approaches to converting raw data into analytics-ready data: Extract, Transform, Load (ETL) and Extract, Load, Transform (ELT) processes.

Throughout the course, you will gain a deep understanding of the fundamental concepts and practical implementation of data extraction, transformation, and loading techniques. You will explore the essential components, processes, tools, and technologies involved in data pipelines, including batch versus concurrent modes of execution. Additionally, you will learn to implement ETL pipelines through shell scripting and understand the differences between ETL and ELT processes.

The course provides hands-on experience in building data pipelines using Apache Airflow, enabling you to comprehend the advantages of using this approach. You will also master the creation of streaming pipelines using Apache Kafka, familiarizing yourself with its core components such as brokers, topics, partitions, replications, producers, and consumers.

By the end of the course, you will be equipped to define data transformations, load data into data repositories, and ensure data quality. Through practical exercises and labs, you will develop the skills to monitor load failures and employ recovery mechanisms in case of failure. The course culminates with a shareable final project, allowing you to showcase the skills acquired in each module.

Certificate Available ✔

Get Started / More Info
ETL and Data Pipelines with Shell, Airflow and Kafka
Course Modules

This course comprises modules that cover data processing techniques, ETL and data pipeline tools and techniques, building data pipelines using Apache Airflow, building streaming pipelines using Apache Kafka, and a final assignment to demonstrate acquired skills.

Data Processing Techniques

Explore essential data processing techniques, including ETL fundamentals, ELT basics, and data extraction techniques. Gain insights into data transformation and loading techniques, and understand the differences between ETL and ELT processes. Engage in interactive quizzes and a practice quiz to solidify your understanding.

ETL & Data Pipelines: Tools and Techniques

Dive into the world of ETL and data pipeline tools and techniques. Learn to implement ETL using shell scripting, understand key data pipeline processes, and differentiate between batch and streaming data pipeline use cases. Engage in hands-on labs and quizzes to reinforce your learning.

Building Data Pipelines using Airflow

Delve into Apache Airflow and its advantages in building data pipelines. Learn to build directed acyclic graphs (DAGs) using Airflow, utilize the Airflow UI, and monitor and log your data pipelines. Participate in quizzes and hands-on labs to apply your knowledge practically.

Building Streaming Pipelines using Kafka

Gain a comprehensive understanding of Apache Kafka and its role in building streaming pipelines. Explore the components of the distributed event streaming platform, learn about the Kafka streaming process, and engage in hands-on labs to work with streaming data using Kafka. Optional labs provide further exploration of Kafka's capabilities.

Final Assignment

Complete a final assignment that enables you to showcase the skills acquired throughout the course. Demonstrate your proficiency in data processing, ETL, data pipelines, and streaming pipelines through a shareable project, solidifying your understanding and practical abilities.

More Data Management Courses

Databases for Data Scientists

University of Colorado Boulder

Databases for Data Scientists equips learners with relational database design skills, SQL programming expertise, and insights into future database trends.

Blue Prism Foundation Training

Blue Prism

Blue Prism Foundation Training provides a comprehensive introduction to configuring a Blue Prism Process Solution, empowering learners to build end-to-end automation....

Mejores prácticas para el procesamiento de datos en Big Data

Coursera Project Network

Mejores prácticas para el procesamiento de datos en Big Data: Aprende a aplicar buenas prácticas en el procesamiento de datos en Big Data utilizando Databricks...

ChatGPT Advanced Data Analysis

Vanderbilt University

ChatGPT Advanced Data Analysis is a transformative course, teaching how to automate tasks using natural language and AI. It empowers users to enhance productivity...