Learn the fundamentals of scalable data science with the "Fundamentals of Scalable Data Science" course offered by IBM. This comprehensive course introduces you to Apache Spark, Python, and PySpark, enabling you to process large-scale data effectively. Throughout the four-week program, you will delve into statistical measures, data visualization, and big data techniques, gaining the skills necessary to work with big data and advance your career in data science.
Key course components include:
Certificate Available ✔
Get Started / More InfoMaster Apache Spark and big data analysis with this course. Learn the basics of Apache Spark, explore big data tools and programming languages, scale mathematical statistics, and visualize big data using Apache Spark and Python's matplotlib.
Course Overview and a warm welcome to the program, providing an introduction to Apache Spark and the technology used within the course. The module also covers grading environment and programming assignment submissions.
Explore data storage solutions, parallel data processing strategies of Apache Spark, programming language options, functional programming basics, and the introduction of Cloudant. The module also covers Resilient Distributed Dataset and DataFrames, optional test data generator, and Apache Parquet.
Understand scaling math for statistics on Apache Spark, covering topics such as averages, standard deviation, skewness, kurtosis, covariance, covariance matrices, correlation, and multidimensional vector spaces. Additionally, it includes an exercise on averages and standard deviation, skewness and kurtosis, and covariance, correlation, and multidimensional vector spaces.
Learn about data visualization of big data, including plotting with Apache Spark and Python's matplotlib, dimensionality reduction, and PCA. The module also encompasses exercises on plotting and PCA, assignment and exercise environment setup, and optional labs for setting up programming assignments in Watson Studio.
Prepare for a career in data analytics with the IBM Data Analyst program. Gain essential skills in Python, Excel, and SQL to kickstart your career in as little as...
Clinical Natural Language Processing is a comprehensive course covering the fundamentals of NLP and practical techniques for text processing, culminating in a real-world...
Proceso de datos sucios a datos limpios proporciona habilidades esenciales para analistas de datos principiantes, incluyendo el control y limpieza de datos con hojas...
Data Storytelling equips you with the principles and techniques of storytelling, coupled with data visualization skills, to craft captivating stories that engage...