Course

Scalable Machine Learning on Big Data using Apache Spark

IBM

This course equips you with the essential skills to scale data science and machine learning tasks on Big Data sets using Apache Spark. You will gain a practical understanding of Apache Spark and learn to apply it to solve machine learning problems involving both small and big data. The course covers parallel code writing, utilizing large-scale compute clusters, eliminating out-of-memory errors, and testing thousands of different ML models in parallel. Optional content includes running SQL statements on very large data sets using Apache SparkSQL and the Apache Spark DataFrame API.

Throughout the course, you will also be able to practice running machine learning tasks hands-on on an Apache Spark cluster provided by IBM. The course is designed for individuals with basic Python programming, basic machine learning, and basic SQL skills. Upon completion, you will have the expertise to work with Big Data and machine learning techniques successfully applied by leading companies such as Alibaba, Apple, Amazon, IBM, and many others.

Learn to scale data science and machine learning tasks on Big Data sets using Apache Spark
Gain a practical understanding of Apache Spark and its application to solve machine learning problems
Utilize large-scale compute clusters and eliminate out-of-memory errors
Practice running machine learning tasks hands-on on an Apache Spark cluster provided by IBM

Certificate Available ✔

Get Started / More Info

Scalable Machine Learning on Big Data using Apache Spark

This course comprises four modules aimed at equipping learners with the skills to scale data science and machine learning tasks on Big Data sets using Apache Spark.

Week 1: Introduction

Module 1 introduces Apache Spark for Machine Learning on Big Data, covering topics such as parallel data processing strategies, functional programming basics, and Apache SparkSQL. It also includes hands-on exercises and practice quizzes to reinforce your learning.

Week 2: Scaling Math for Statistics on Apache Spark

Module 2 focuses on scaling math for statistics on Apache Spark, delving into averages, standard deviation, covariance, correlation, and dimensionality reduction. You will also practice statistics and API usage on Spark through exercises and quizzes.

Week 3: Introduction to Apache SparkML

Module 3 provides an introduction to Apache SparkML, explaining how ML pipelines work, SparkML concepts, and practical exercises on modifying Apache SparkML feature engineering pipelines and working with clustering.

Week 4: Supervised and Unsupervised learning with SparkML

Module 4 explores supervised and unsupervised learning with SparkML, covering topics such as linear regression, logistic regression, and classification performance improvement. You will also engage in a course project and quizzes to apply and test your understanding of SparkML algorithms.

Course

Scalable Machine Learning on Big Data using Apache Spark

Course Modules

Week 1: Introduction

Week 2: Scaling Math for Statistics on Apache Spark

Week 3: Introduction to Apache SparkML

Week 4: Supervised and Unsupervised learning with SparkML

More Machine Learning Courses

Preparing for Google Cloud Certification: Cloud Data Engineer

Demand Forecasting Using Time Series

Launching into Machine Learning

Procesamiento de Lenguaje Natural