Course

NoSQL, Big Data, and Spark Foundations

IBM

Big Data Engineers and professionals with NoSQL skills are highly sought after in the data management industry. This Specialization is designed for those seeking to develop fundamental skills for working with Big Data, Apache Spark, and NoSQL databases.

The course covers popular NoSQL databases like MongoDB and Apache Cassandra, the widely used Apache Hadoop ecosystem of Big Data tools, as well as Apache Spark analytics engine for large-scale data processing.

  • Learn to work with NoSQL databases for data management tasks such as creating & replicating databases, inserting, updating, deleting, querying, indexing, aggregating & sharding data.
  • Gain fundamental knowledge of Big Data technologies such as Hadoop, MapReduce, HDFS, Hive, and HBase, followed by a more in-depth working knowledge of Apache Spark, Spark Dataframes, Spark SQL, PySpark, the Spark Application UI, and scaling Spark with Kubernetes.
  • Develop hands-on experience performing Extract, Transform and Load (ETL) processing and Machine Learning model training and deployment with Apache Spark.

Certificate Available ✔

Get Started / More Info
NoSQL, Big Data, and Spark Foundations
Course Modules

This specialization covers the fundamentals of NoSQL databases, Big Data with Spark and Hadoop, and Machine Learning with Apache Spark, providing a comprehensive understanding and practical experience in these areas.

Introduction to NoSQL Databases

Differentiate between the four main categories of NoSQL repositories. Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools. Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations. Execute keyspace, table, and CRUD operations in Cassandra.

Introduction to Big Data with Spark and Hadoop

Explain the impact of big data, including use cases, tools, and processing methods. Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce. Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL. Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.

Machine Learning with Apache Spark

Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence. Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines. Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML. Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.

More Data Management Courses

Cognitive Solutions and RPA Analytics

Automation Anywhere

Explore the Cognitive Solutions and RPA Analytics course to understand the role of cognitive automation and RPA analytics in processing unstructured data.

Introduction to AWS Elastic File System

Coursera Project Network

Introduction to AWS Elastic File System provides hands-on experience in creating and configuring file systems in AWS, allowing seamless access from multiple instances....

Retrieve Data with Multiple-Table SQL Queries

Coursera Project Network

Retrieve Data with Multiple-Table SQL Queries

Achieving Advanced Insights with BigQuery - Português

Google Cloud

Achieving Advanced Insights with BigQuery is a comprehensive course in Portuguese that delves into advanced SQL functions, query optimization, and data access control...