Course

Sample-based Learning Methods

Alberta Machine Intelligence Institute & University of Alberta

In this comprehensive course on sample-based learning methods in reinforcement learning, you will explore algorithms that enable near optimal policies to be learned through trial and error interactions with the environment. The course covers intuitive yet powerful Monte Carlo methods, temporal difference learning methods such as Q-learning, and strategies for combining model-based planning with temporal difference updates for accelerated learning.

Throughout the course, you will delve into various topics, including the importance of exploration, the connections between Monte Carlo and dynamic programming, and the differences between on-policy and off-policy control. You will also gain practical experience by implementing and applying algorithms such as TD learning, Expected Sarsa, and Q-learning. Additionally, you will learn about planning with simulated experience and explore a model-based approach to reinforcement learning called Dyna.

By the end of the course, you will have the knowledge and skills to understand, implement, and apply sample-based learning methods in reinforcement learning, and conduct empirical studies to measure improvements in sample efficiency.

Certificate Available ✔

Get Started / More Info

This course covers a range of topics, including Monte Carlo methods for prediction and control, temporal difference learning methods for prediction and control, and planning, learning, and acting with a focus on model-based reinforcement learning using the Dyna architecture.

Welcome to the Course!

In the introductory module, you will meet the instructors, get an overview of the reinforcement learning textbook, and review the course prerequisites and learning objectives.

Monte Carlo Methods for Prediction & Control

Module 2 delves into Monte Carlo methods for prediction and control, covering topics such as off-policy learning, importance sampling, and the use of Monte Carlo for generalized policy iteration.

Temporal Difference Learning Methods for Prediction

Module 3 focuses on temporal difference learning methods for prediction, exploring the advantages of TD learning, comparing TD and Monte Carlo, and policy evaluation with temporal difference learning.

Temporal Difference Learning Methods for Control

Temporal difference learning methods for control are covered in Module 4, including Sarsa, Q-learning, and Expected Sarsa, with a focus on off-policy learning and learning multiple goals.

Planning, Learning & Acting

The final module, Module 5, delves into planning, learning, and acting, exploring the concept of a model, comparing sample and distribution models, and implementing model-based reinforcement learning using the Dyna architecture.

Course

Sample-based Learning Methods

Course Modules

Welcome to the Course!

Monte Carlo Methods for Prediction & Control

Temporal Difference Learning Methods for Prediction

Temporal Difference Learning Methods for Control

Planning, Learning & Acting

More Machine Learning Courses

Natural Language Processing

Deep Neural Networks with PyTorch

Introduction to Trading, Machine Learning & GCP

Generative AI: Prompt Engineering Basics