Course

Fundamentals of Scalable Data Science

IBM

Learn the fundamentals of scalable data science with the "Fundamentals of Scalable Data Science" course offered by IBM. This comprehensive course introduces you to Apache Spark, Python, and PySpark, enabling you to process large-scale data effectively. Throughout the four-week program, you will delve into statistical measures, data visualization, and big data techniques, gaining the skills necessary to work with big data and advance your career in data science.

Key course components include:

Introduction to Apache Spark and its applications
Exploration of big data tools and programming languages
Scaling mathematical statistics on Apache Spark
Data visualization of big data using Apache Spark and Python's matplotlib

Certificate Available ✔

Get Started / More Info

Master Apache Spark and big data analysis with this course. Learn the basics of Apache Spark, explore big data tools and programming languages, scale mathematical statistics, and visualize big data using Apache Spark and Python's matplotlib.

Introduction the course and grading environment

Course Overview and a warm welcome to the program, providing an introduction to Apache Spark and the technology used within the course. The module also covers grading environment and programming assignment submissions.

Tools that support BigData solutions

Explore data storage solutions, parallel data processing strategies of Apache Spark, programming language options, functional programming basics, and the introduction of Cloudant. The module also covers Resilient Distributed Dataset and DataFrames, optional test data generator, and Apache Parquet.

Scaling Math for Statistics on Apache Spark

Understand scaling math for statistics on Apache Spark, covering topics such as averages, standard deviation, skewness, kurtosis, covariance, covariance matrices, correlation, and multidimensional vector spaces. Additionally, it includes an exercise on averages and standard deviation, skewness and kurtosis, and covariance, correlation, and multidimensional vector spaces.

Data Visualization of Big Data

Learn about data visualization of big data, including plotting with Apache Spark and Python's matplotlib, dimensionality reduction, and PCA. The module also encompasses exercises on plotting and PCA, assignment and exercise environment setup, and optional labs for setting up programming assignments in Watson Studio.

Course

Fundamentals of Scalable Data Science

Course Modules

Introduction the course and grading environment

Tools that support BigData solutions

Scaling Math for Statistics on Apache Spark

Data Visualization of Big Data

More Data Analysis Courses

IBM Data Analyst

Clinical Natural Language Processing

Proceso de datos sucios a datos limpios

Data Storytelling