# IITM RL Final Project

## BSuite Benchmark for Reinforcement Learning

This notebook uses an open-source reinforcement learning benchmark known as bsuite.

https://github.com/deepmind/bsuite

BSuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning agent.

Your task is to use any reinforcement learning techniques at your disposal to get high scores on the environments specified.

**Note**: Since the course is on Reinforcement Learning, please limit yourself to using traditional Reinforcement Learning algorithms.

**Do not use deep reinforcement learning.**

You will be implementing a traditional RL algorithm to solve 3 environments.

## Environment 1: CATCH

In this environment , the agent must move a paddle to intercept falling balls. Falling balls only move downwards on the column they are in.

The observation is an array shape (rows, columns), with binary values: 0 if a space is empty; 1 if it contains the paddle or a ball.

The actions 3 discrete actions possible: ['stay', 'left', 'right'].

The episode terminates when the ball reaches the bottom of the screen.

## Environment 2: CARTPOLE

This environment implements a version of the classic Cartpole task, where the cart has to counter the movements of the pole to prevent it from falling over.

The observation is a vector representing: (x, x_dot, sin(theta), cos(theta), theta_dot, time_elapsed)

The actions are discrete and there are 3 of them available: ['left', 'stay', 'right'].

Episodes start with the pole close to upright. Episodes end when the pole falls, the cart falls off the table, or the max_time is reached.

## Environment 3: MOUNTAIN CAR

This environment implements a version of the classic Mountain Car problem where an underpowered car must power up a hill.

The observation is a vector representing: (x, x_dot, time_elapsed)

There are 3 discrete actions available: ['push left', 'no push', 'push right']

Episodes start with the car at the bottom of the hill with no velocity. An episode ends when you reach position x=0.5, or if 1000 steps have been completed.

Each environment has a NOISE variant which adds a scaled random noise to the received rewards. More details in the BSuite Paper.

π Submission

Before submitting, make sure to accept the rules.

Go to the starter kit notebook and follow the instructions to implement your agent in the notebook.

π―Scoring

We use BSuite's scoring system to determine score for each environment. The final score is the sum of all the test environments' scores.