Course

Fundamentals of Scalable Data Science

IBM

Learn the fundamentals of scalable data science with the "Fundamentals of Scalable Data Science" course offered by IBM. This comprehensive course introduces you to Apache Spark, Python, and PySpark, enabling you to process large-scale data effectively. Throughout the four-week program, you will delve into statistical measures, data visualization, and big data techniques, gaining the skills necessary to work with big data and advance your career in data science.

Key course components include:

  • Introduction to Apache Spark and its applications
  • Exploration of big data tools and programming languages
  • Scaling mathematical statistics on Apache Spark
  • Data visualization of big data using Apache Spark and Python's matplotlib

Certificate Available ✔

Get Started / More Info
Fundamentals of Scalable Data Science
Course Modules

Master Apache Spark and big data analysis with this course. Learn the basics of Apache Spark, explore big data tools and programming languages, scale mathematical statistics, and visualize big data using Apache Spark and Python's matplotlib.

Introduction the course and grading environment

Course Overview and a warm welcome to the program, providing an introduction to Apache Spark and the technology used within the course. The module also covers grading environment and programming assignment submissions.

Tools that support BigData solutions

Explore data storage solutions, parallel data processing strategies of Apache Spark, programming language options, functional programming basics, and the introduction of Cloudant. The module also covers Resilient Distributed Dataset and DataFrames, optional test data generator, and Apache Parquet.

Scaling Math for Statistics on Apache Spark

Understand scaling math for statistics on Apache Spark, covering topics such as averages, standard deviation, skewness, kurtosis, covariance, covariance matrices, correlation, and multidimensional vector spaces. Additionally, it includes an exercise on averages and standard deviation, skewness and kurtosis, and covariance, correlation, and multidimensional vector spaces.

Data Visualization of Big Data

Learn about data visualization of big data, including plotting with Apache Spark and Python's matplotlib, dimensionality reduction, and PCA. The module also encompasses exercises on plotting and PCA, assignment and exercise environment setup, and optional labs for setting up programming assignments in Watson Studio.

More Data Analysis Courses

IBM Data Analyst

IBM

Prepare for a career in data analytics with the IBM Data Analyst program. Gain essential skills in Python, Excel, and SQL to kickstart your career in as little as...

Clinical Natural Language Processing

University of Colorado System

Clinical Natural Language Processing is a comprehensive course covering the fundamentals of NLP and practical techniques for text processing, culminating in a real-world...

Proceso de datos sucios a datos limpios

Google

Proceso de datos sucios a datos limpios proporciona habilidades esenciales para analistas de datos principiantes, incluyendo el control y limpieza de datos con hojas...

Data Storytelling

Fractal Analytics

Data Storytelling equips you with the principles and techniques of storytelling, coupled with data visualization skills, to craft captivating stories that engage...