Data Manipulation at Scale: Systems and Algorithms is an in-depth course offered by the University of Washington, focusing on the challenges and methodologies of data science at realistic scales. This course provides a comprehensive understanding of the landscape of relevant systems, their principles, tradeoffs, and evaluation against specific requirements. By the end of the course, participants will have the ability to:
This course delves into a wide range of topics, including data science context, relational databases and the relational algebra, MapReduce and parallel dataflow programming, NoSQL systems and concepts, and graph analytics. It is designed to equip learners with the necessary skills and knowledge to effectively work with large, heterogeneous, and noisy datasets, providing a solid foundation for data science at realistic scales.
Certificate Available ✔
Get Started / More InfoThis course covers diverse modules including Data Science Context and Concepts, Relational Databases and the Relational Algebra, MapReduce and Parallel Dataflow Programming, NoSQL: Systems and Concepts, and Graph Analytics.
Data Science Context and Concepts provides an introduction to the course, discussing the characteristics and dimensions of data science, tools vs. abstractions, and the history and context of data science.
Relational Databases and the Relational Algebra module covers data models, relational databases, SQL, and practical applications such as user-defined functions, optimization, and declarative languages.
MapReduce and Parallel Dataflow Programming delves into scalable algorithms, data-parallel algorithms, MapReduce data model, and the comparison of RDBMS with Hadoop, concluding with a hands-on exercise of "Thinking in MapReduce."
NoSQL: Systems and Concepts explores the context, types of NoSQL systems, consistency guarantees, and various NoSQL databases such as Memcached, DynamoDB, CouchDB, BigTable, HBase, and Spanner.
Graph Analytics provides an overview of graph analysis, structural analysis, centrality, PageRank, traversal tasks, querying edge tables, and representation in MapReduce and Pregel.
Big Data course offers hands-on experience with tools and systems used by big data scientists and engineers, providing insights into real-world problems and questions....
Sports Performance Analytics provides an in-depth exploration of sports analytics, using real data sets from various sports leagues to construct predictive models...
Prepare to design and operate a secure and scalable data lake on AWS without prior data science knowledge.
SQL: A Practical Introduction for Querying Databases is a comprehensive course that equips learners with foundational and intermediate SQL knowledge necessary for...