Course

NoSQL, Big Data, and Spark Foundations

IBM

Big Data Engineers and professionals with NoSQL skills are highly sought after in the data management industry. This Specialization is designed for those seeking to develop fundamental skills for working with Big Data, Apache Spark, and NoSQL databases.

The course covers popular NoSQL databases like MongoDB and Apache Cassandra, the widely used Apache Hadoop ecosystem of Big Data tools, as well as Apache Spark analytics engine for large-scale data processing.

Learn to work with NoSQL databases for data management tasks such as creating & replicating databases, inserting, updating, deleting, querying, indexing, aggregating & sharding data.
Gain fundamental knowledge of Big Data technologies such as Hadoop, MapReduce, HDFS, Hive, and HBase, followed by a more in-depth working knowledge of Apache Spark, Spark Dataframes, Spark SQL, PySpark, the Spark Application UI, and scaling Spark with Kubernetes.
Develop hands-on experience performing Extract, Transform and Load (ETL) processing and Machine Learning model training and deployment with Apache Spark.

Certificate Available ✔

Get Started / More Info

This specialization covers the fundamentals of NoSQL databases, Big Data with Spark and Hadoop, and Machine Learning with Apache Spark, providing a comprehensive understanding and practical experience in these areas.

Introduction to NoSQL Databases

Differentiate between the four main categories of NoSQL repositories. Describe the characteristics, features, benefits, limitations, and applications of the more popular Big Data processing tools. Perform common tasks using MongoDB tasks including create, read, update, and delete (CRUD) operations. Execute keyspace, table, and CRUD operations in Cassandra.

Introduction to Big Data with Spark and Hadoop

Explain the impact of big data, including use cases, tools, and processing methods. Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce. Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL. Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.

Machine Learning with Apache Spark

Describe ML, explain its role in data engineering, summarize generative AI, discuss Spark's uses, and analyze ML pipelines and model persistence. Evaluate ML models, distinguish between regression, classification, and clustering models, and compare data engineering pipelines with ML pipelines. Construct the data analysis processes using Spark SQL, and perform regression, classification, and clustering using SparkML. Demonstrate connecting to Spark clusters, build ML pipelines, perform feature extraction and transformation, and model persistence.

Course

NoSQL, Big Data, and Spark Foundations

Course Modules

Introduction to NoSQL Databases

Introduction to Big Data with Spark and Hadoop

Machine Learning with Apache Spark

More Data Management Courses

Cognitive Solutions and RPA Analytics

Introduction to AWS Elastic File System

Retrieve Data with Multiple-Table SQL Queries

Achieving Advanced Insights with BigQuery - Português