Course

Data Manipulation at Scale: Systems and Algorithms

University of Washington

Data Manipulation at Scale: Systems and Algorithms is an in-depth course offered by the University of Washington, focusing on the challenges and methodologies of data science at realistic scales. This course provides a comprehensive understanding of the landscape of relevant systems, their principles, tradeoffs, and evaluation against specific requirements. By the end of the course, participants will have the ability to:

  • Describe common patterns, challenges, and approaches associated with data science projects
  • Identify and use programming models for scalable data manipulation, including relational algebra, mapreduce, and other data flow models
  • Evaluate key-value stores and NoSQL systems, understanding their tradeoffs and future trends
  • Write programs in Spark and understand the associated ecosystem of algorithms, extensions, and languages
  • Describe the landscape of specialized Big Data systems for graphs, arrays, and streams

This course delves into a wide range of topics, including data science context, relational databases and the relational algebra, MapReduce and parallel dataflow programming, NoSQL systems and concepts, and graph analytics. It is designed to equip learners with the necessary skills and knowledge to effectively work with large, heterogeneous, and noisy datasets, providing a solid foundation for data science at realistic scales.

Certificate Available ✔

Get Started / More Info
Data Manipulation at Scale: Systems and Algorithms
Course Modules

This course covers diverse modules including Data Science Context and Concepts, Relational Databases and the Relational Algebra, MapReduce and Parallel Dataflow Programming, NoSQL: Systems and Concepts, and Graph Analytics.

Data Science Context and Concepts

Data Science Context and Concepts provides an introduction to the course, discussing the characteristics and dimensions of data science, tools vs. abstractions, and the history and context of data science.

Relational Databases and the Relational Algebra

Relational Databases and the Relational Algebra module covers data models, relational databases, SQL, and practical applications such as user-defined functions, optimization, and declarative languages.

MapReduce and Parallel Dataflow Programming

MapReduce and Parallel Dataflow Programming delves into scalable algorithms, data-parallel algorithms, MapReduce data model, and the comparison of RDBMS with Hadoop, concluding with a hands-on exercise of "Thinking in MapReduce."

NoSQL: Systems and Concepts

NoSQL: Systems and Concepts explores the context, types of NoSQL systems, consistency guarantees, and various NoSQL databases such as Memcached, DynamoDB, CouchDB, BigTable, HBase, and Spanner.

Graph Analytics

Graph Analytics provides an overview of graph analysis, structural analysis, centrality, PageRank, traversal tasks, querying edge tables, and representation in MapReduce and Pregel.

More Data Analysis Courses

Big Data

University of California San Diego

Big Data course offers hands-on experience with tools and systems used by big data scientists and engineers, providing insights into real-world problems and questions....

Sports Performance Analytics

University of Michigan

Sports Performance Analytics provides an in-depth exploration of sports analytics, using real data sets from various sports leagues to construct predictive models...

Introduction to Designing Data Lakes on AWS

Amazon Web Services

Prepare to design and operate a secure and scalable data lake on AWS without prior data science knowledge.

SQL: A Practical Introduction for Querying Databases

IBM

SQL: A Practical Introduction for Querying Databases is a comprehensive course that equips learners with foundational and intermediate SQL knowledge necessary for...