Course

Scalable Machine Learning on Big Data using Apache Spark

IBM

This course equips you with the essential skills to scale data science and machine learning tasks on Big Data sets using Apache Spark. You will gain a practical understanding of Apache Spark and learn to apply it to solve machine learning problems involving both small and big data. The course covers parallel code writing, utilizing large-scale compute clusters, eliminating out-of-memory errors, and testing thousands of different ML models in parallel. Optional content includes running SQL statements on very large data sets using Apache SparkSQL and the Apache Spark DataFrame API.

Throughout the course, you will also be able to practice running machine learning tasks hands-on on an Apache Spark cluster provided by IBM. The course is designed for individuals with basic Python programming, basic machine learning, and basic SQL skills. Upon completion, you will have the expertise to work with Big Data and machine learning techniques successfully applied by leading companies such as Alibaba, Apple, Amazon, IBM, and many others.

  • Learn to scale data science and machine learning tasks on Big Data sets using Apache Spark
  • Gain a practical understanding of Apache Spark and its application to solve machine learning problems
  • Utilize large-scale compute clusters and eliminate out-of-memory errors
  • Practice running machine learning tasks hands-on on an Apache Spark cluster provided by IBM

Certificate Available ✔

Get Started / More Info
Scalable Machine Learning on Big Data using Apache Spark
Course Modules

This course comprises four modules aimed at equipping learners with the skills to scale data science and machine learning tasks on Big Data sets using Apache Spark.

Week 1: Introduction

Module 1 introduces Apache Spark for Machine Learning on Big Data, covering topics such as parallel data processing strategies, functional programming basics, and Apache SparkSQL. It also includes hands-on exercises and practice quizzes to reinforce your learning.

Week 2: Scaling Math for Statistics on Apache Spark

Module 2 focuses on scaling math for statistics on Apache Spark, delving into averages, standard deviation, covariance, correlation, and dimensionality reduction. You will also practice statistics and API usage on Spark through exercises and quizzes.

Week 3: Introduction to Apache SparkML

Module 3 provides an introduction to Apache SparkML, explaining how ML pipelines work, SparkML concepts, and practical exercises on modifying Apache SparkML feature engineering pipelines and working with clustering.

Week 4: Supervised and Unsupervised learning with SparkML

Module 4 explores supervised and unsupervised learning with SparkML, covering topics such as linear regression, logistic regression, and classification performance improvement. You will also engage in a course project and quizzes to apply and test your understanding of SparkML algorithms.

More Machine Learning Courses

Preparing for Google Cloud Certification: Cloud Data Engineer

Google Cloud

Preparing for Google Cloud Certification: Cloud Data Engineer equips learners with the skills to excel in cloud data engineering. Gain hands-on experience and prepare...

Demand Forecasting Using Time Series

LearnQuest

Demand Forecasting Using Time Series is a specialized course focusing on building ARIMA models in Python for demand prediction and exploring advanced neural networks...

Launching into Machine Learning

Google Cloud

Launching into Machine Learning offers a comprehensive exploration of data improvement, ML model building, and evaluation. The course equips learners with practical...

Procesamiento de Lenguaje Natural

Universidad Austral

Procesamiento de Lenguaje Natural course provides comprehensive knowledge for developing NLP applications and creating your own NLP environment.