MWF 1:00 - 1:50 pm Lectures: CAB-235
This course provides an introduction to reinforcement learning, which focuses on the study and design of learning agents that interact with a complex, uncertain world to achieve a goal. The course will cover multi-armed bandits, Markov decision processes, reinforcement learning, planning, and function approximation (online supervised learning). The course will take an information-processing approach to the study of intelligence and briefly touch on perspectives from psychology, neuroscience, and philosophy. The course will use the University of Alberta MOOC on Reinforcement Learning. Any student who understands the material in this course will understand the foundations of much of modern probabilistic artificial intelligence (AI) and be prepared to take more advanced courses, or to apply AI tools and ideas to real-world problems.
The course will use a MOOC on Reinforcement Learning, created by UAlberta CS Faculty members. Our course consists of lectures and assignments from both MOOC and what is taught in class.
By the end of the course, you will have a solid grasp of the main ideas in reinforcement learning, which is the primary approach to statistical decision-making. Any student who understands the material in this course will understand the foundations of much of modern probabilistic artificial intelligence (AI) and be prepared to take more advanced courses (in particular CMPUT 653: Theoretical Foundations of Reinforcement Learning, CMPUT 655: Reinforcement Learning I, CMPUT 609: Reinforcement Learning II, and CMPUT 653: Real-Time Policy Learning) or to apply AI tools and ideas to real-world problems. That person will be able to apply these tools and ideas in novel situations - for example, to determine whether the methods apply to some situation, and, if so, which will work most effectively. They will also be able to assess claims made by others, with respect to both software products and general frameworks, and also be able to appreciate some new research results.
The course will use Python 3. We will use elementary ideas of probability, calculus, and linear algebra, such as expectations of random variables, conditional expectations, derivatives and the chain rule, vectors, and matrices. Students should either be familiar with these topics or be ready to pick them up quickly as needed by consulting outside resources.
With a focus on AI as the design of agents learning from experience to predict and control their environment, topics will include
The course work will come from the quizzes and assignments through the Coursera Platform. There will be one small programming assignment (notebook) or one multiple-choice quiz due each week, through the Coursera Platform. There are also practice quizzes that will be due every Tuesday (midnight). Each week, you have to complete the quiz by midnight on Saturday, for the topic that coming week. That means you have to have completed the lecture videos and readings as well for that week. The course will also have a midterm exam, given in class, and a final exam at the end. Another component is the written assignment.
There are a total of 11 weekly practice quizzes, and you should do all of them. But, due to the fact that issues sometimes arise, we give you a mulligan: Completing 10 of the 11 will get you the full mark for this component. Practice quizzes are “ungraded” in the sense that you have to get 80% answers right to get full marks in each quiz. Each practice quiz has an equal weight.
There are a total of 10 graded assignments, and you should do all of them. But, due to the fact that issues sometimes arise, we give you a mulligan: Completing 9 of the 10 will get you the full mark for this component. The graded assignments are either python notebooks or they are a graded quiz. All the assignments will be due at 11:59pm of the day mentioned in the schedule. Each graded assignment has an equal weight.
Component weights:
Up to 15% of the mark from course components other than the final can be shifted to the final exam, no questions asked, regardless of missed or not: “Optimized marks” (the Python script we used is available here).
We will be using videos from the RL MOOC. We will be using the following textbook extensively:
Sutton and Barto, Reinforcement Learning: An Introduction, MIT Press.
The book is available from the bookstore or online as a PDF here.
All assignments written and programming are to be done individually. No exceptions. Students must write their own answers and code. Students are permitted and encouraged to discuss assignment problems and the contents of the course. However, the discussion should always be about high-level ideas. Students should not discuss with each other (or tutors) while writing answers to written questions our programming. Absolutely no sharing of answers or code sharing with other students or tutors. All the sources used for problem solution must be acknowledged, e.g. web sites, books, research papers, personal communication with people, etc. The University of Alberta is committed to the highest standards of academic integrity and honesty. Students are expected to be familiar with these standards regarding academic honesty and to uphold the policies of the University in this respect. Students are particularly urged to familiarize themselves with the provisions of the Code of Student Behaviour and avoid any behaviour which could potentially result in suspicions of cheating, plagiarism, misrepresentation of facts and/or participation in an offence. Academic dishonesty is a serious offence and can result in suspension or expulsion from the University. (GFC 29 SEP 2003)