IITM RL Final Project
BSuite Challenge Starter Kit
IITM RL Final Project Bsuite starter kit with random baseline
IITM RL FINAL PROJECT¶
Problem - bsuite¶
This notebook uses an open source reinforcement learning benchmark known as bsuite. https://github.com/deepmind/bsuite
bsuite is a collection of carefully-designed experiments that investigate core capabilities of a reinforcement learning agent.
Your task is to use any reinforcement learning techniques at your disposal to get high scores on the environments specified.
Note: Since the course is on Reinforcement Learning, please limit yourself to using traditional Reinforcement Learning algorithms,
Do not use deep reinforcement learning.
How to use this notebook? 📝¶
- This is a shared template and any edits you make here will not be saved. You should make a copy in your own drive. Click the "File" menu (top-left), then "Save a Copy in Drive". You will be working in your copy however you like.
- Update the config parameters. You can define the common variables here
Variable | Description |
---|---|
AICROWD_RESULTS_DIR |
Path to write the output to. |
AICROWD_ASSETS_DIR |
In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation. |
AICROWD_API_KEY |
In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me |
- Installing packages. Please use the Install packages 🗃 section to install the packages
!pip install -q aicrowd-cli
AIcrowd Runtime Configuration 🧷¶
Get login API key from https://www.aicrowd.com/participants/me
import os
AICROWD_RESULTS_DIR = os.getenv("OUTPUTS_DIR", "results")
os.environ["RESULTS_DIR"] = AICROWD_RESULTS_DIR
API_KEY = ""
!aicrowd login --api-key $API_KEY
Install packages 🗃¶
Please add all pacakage installations in this section
!pip install git+http://gitlab.aicrowd.com/nimishsantosh107/bsuite.git
!pip install tabulate
!pip install tqdm
## Add any other installations you need here
Import packages¶
import gym
import warnings
import numpy as np
import pandas as pd
import plotnine as gg
from tqdm.notebook import tqdm
import bsuite
from bsuite.aicrowd import environments
from bsuite.aicrowd.runner import Runner
from bsuite.aicrowd.analysis import Analyzer
pd.options.mode.chained_assignment = None
gg.theme_set(gg.theme_bw(base_size=16, base_family='serif'))
gg.theme_update(figure_size=(3, 1), panel_spacing_x=0.5, panel_spacing_y=0.5)
warnings.filterwarnings('ignore')
Agent Class¶
You can modify the AGENT TEMPLATE below and implement the logic of your agent. Your agent must implement a few methods that will be called by the Runner
class.
__init__
- put any initialization code here.get_action
- takes in astate
and returns anaction
.learn
- takes in(state, action, reward, next_state)
, implements the learning logic.get_state
- takes in a rawobservation
directly from the env, discretizes it and returns astate
.
In addition to these, you may implement other methods which can be called by the above methods.
Since there are multiple environments, you may need unique hyper parameters for each environment. Instantiate the agent while passing in the hyper parameters in a dictionary using the agent_config
parameter so that each environment can use different hyper parameters for the agent while using a single Agent
class for all of them. You can use any names for the keys in the config dictionary.
An example RandomAgent
is given below.
# *** YOU CAN EDIT THIS CELL ***
# AGENT TEMPLATE
class Agent:
def __init__(self, agent_config=None):
self.config = agent_config
pass
def get_action(self, state):
'''
PARAMETERS :
- state - discretized 'state'
RETURNS :
- action - 'action' to be taken
'''
raise NotImplementedError
return action
def learn(self, state, action, reward, next_state, done):
'''
PARAMETERS :
- state - discretized 'state'
- action - 'action' performed in 'state'
- reward - 'reward' received due to action taken
- next_state - discretized 'next_state'
- done - status flag to represent if an episode is done or not
RETURNS :
- NIL
'''
raise NotImplementedError
def get_state(self, observation):
'''
PARAMETERS :
- observation - raw 'observation' from environment
RETURNS :
- state - discretized 'state' from raw 'observation'
'''
raise NotImplementedError
return state
# *** YOU CAN EDIT THIS CELL ***
# DO NOT rename the config dictionaries as the evaluator references them. However, you may use any names for the keys in them.
catch_config = {"env_name": "catch"}
catch_noise_config = {"env_name": "catch_noise"}
cartpole_config = {"env_name": "cartpole"}
cartpole_noise_config = {"env_name": "cartpole_noise"}
mountaincar_config = {"env_name": "mountaincar"}
mountaincar_noise_config = {"env_name": "mountaincar_noise"}
# *** YOU CAN EDIT THIS CELL ***
# EXAMPLE
class RandomAgent:
def __init__(self, agent_config={}):
self.config = agent_config
self.env_name = self.config['env_name']
def get_action(self, state):
action = np.random.choice(2)
return action
def learn(self, state, action, reward, next_state, done):
if ('BAR' in self.config):
if (self.config['BAR']):
self.config['FOO'] += 1
def get_state(self, observation):
# In this function you're allowed to use
# the environment name for observation preprocessing
# Do not use it anywhere else
if self.env_name == 'catch':
state = observation
elif self.env_name == 'catch_noise':
state = observation
elif self.env_name == 'cartpole':
state = observation
elif self.env_name == 'cartpole_noise':
state = observation
elif self.env_name == 'mountaincar':
state = observation
elif self.env_name == 'mountaincar_noise':
state = observation
else:
raise NotImplementedError
return state
env1_config = {
"env_name": 'cartpole',
'FOO': 0.1,
'BAR': True
}
env2_config = {
"env_name": 'cartpole',
'FOO': 0.2,
'BAR': False
}
randomAgent1 = RandomAgent(agent_config=env1_config)
randomAgent2 = RandomAgent(agent_config=env2_config)
Playing with the Environment¶
Instantiating the environment :¶
You can create an environment by calling the following function:
environments.load_env(ENV_ID)
- RETURNS: env
where, ENV_ID can be ONE of the following:
environments.CATCH
environments.CATCH_NOISE
environments.CARTPOLE
environments.CARTPOLE_NOISE
environments.MOUNTAINCAR
environments.MOUNTAINCAR_NOISE
The NOISE
environments add a scaled random noise to the reward
.
Runnning the environment :¶
There are certain methods required to run the environments. The interface is very similar to OpenAI Gym's interfaces. Fore more information, read the OpenAI documentation here.
env.reset()
- RETURNS: observation
env.step(action)
- RETURNS: (next_observation, reward, done, info[NOT USED])
There are also a few useful properties within the environments:
env.action_space.n
- total number of possible actions. eg: if 'n' is 3, then the possible actions are[0, 1, 2]
env.observation_space.shape
- the shape of the observation.env.bsuite_num_episodes
- the pre-specified number of episodes which will be run during evaluation (unique for each environment).
ONLY IN CATCH / CATCH_NOISE¶
env.observation_space.high
- the upper limit for every index in the observation.env.observation_space.low
- the lower limit for every index of the observation.
Environment Observation Space Limits:¶
The limits for the observation space (minimum and maximum) for all the environments are given in the table below:
Environments | Limits |
---|---|
CATCH CATCH_NOISE |
MIN: use env.observation_space.low MAX: use env.observation_space.high |
CARTPOLE CARTPOLE_NOISE |
MIN: [-1. -5., -1., -1., -5., 0.] MAX: [ 1., 5., 1., 1., 5., 1.] |
MOUNTAINCAR MOUNTAINCAR_NOISE |
MIN: [-1.2, -0.07, 0.] MAX: [ 0.6, 0.07, 1.] |
[NOTE] Use this code cell to play around and get used to the environments. However, the Runner
class below will be used to evaluate your agent.
# *** YOU CAN EDIT THIS CELL ***
# TEST AREA
env = environments.load_env(environments.CARTPOLE) # replace 'environments.CARTPOLE' with other environments
agent = RandomAgent(agent_config={"env_name": 'cartpole'}) # replace with 'RandomAgent()' to use your custom agent
NUM_EPISODES = 10 # replace with 'env.bsuite_num_episodes' to run for pre-specified number of episodes
for episode_n in tqdm(range(NUM_EPISODES)):
done = False
episode_reward = 0
episode_moves = 0
observation = env.reset()
state = agent.get_state(observation)
while not done:
action = agent.get_action(state)
next_observation, reward, done, _ = env.step(action)
next_state = agent.get_state(next_observation)
agent.learn(state, action, reward, next_state, done)
state = next_state
episode_reward += reward
episode_moves += 1
if (((episode_n+1) % 2) == 0):
print("EPISODE: ",episode_n+1,"\tREWARD: ",episode_reward,"\tEPISODE_LENGTH: ",episode_moves)
Point to the Agent Class you'll use for the final score¶
RLAgent = RandomAgent
Evaluating the Agent on all the Environments¶
- The following cells will take care of running your agent on each environment and aggregating the results in csv files. In each of the following cells, the
agent_config
parameter is already set to use the corresponding config dictionary for that environment. DO NOT EDIT THIS. - Feel free to modify the
LOG_INTERVAL
parameter to change the interval between episodes for logging. - Please do not modify any other contents in each of the cells.
LOG_INTERVAL = 100
runner = Runner(
agent = RLAgent(agent_config=catch_config),
env_id = environments.CATCH,
log_interval = LOG_INTERVAL,
)
runner.play_episodes()
runner = Runner(
agent = RLAgent(agent_config=catch_noise_config),
env_id = environments.CATCH_NOISE,
log_interval = LOG_INTERVAL
)
runner.play_episodes()
runner = Runner(
agent = RLAgent(agent_config=cartpole_config),
env_id = environments.CARTPOLE,
log_interval = LOG_INTERVAL
)
runner.play_episodes()
runner = Runner(
agent = RLAgent(agent_config=cartpole_noise_config),
env_id = environments.CARTPOLE_NOISE,
log_interval = LOG_INTERVAL
)
runner.play_episodes()
runner = Runner(
agent = RLAgent(agent_config=mountaincar_config),
env_id = environments.MOUNTAINCAR,
log_interval = LOG_INTERVAL
)
runner.play_episodes()
runner = Runner(
agent = RLAgent(agent_config=mountaincar_noise_config),
env_id = environments.MOUNTAINCAR_NOISE,
log_interval = LOG_INTERVAL
)
runner.play_episodes()
Analysis & Result¶
The following cells will show the score of the agent on each environment. The same scoring method will be used to evaluate your agent on a set of test environments.
# *** PLEASE DONT EDIT THE CONTENTS OF THIS CELL ***
analyzer = Analyzer(os.environ.get('RESULTS_DIR'))
analyzer.print_scores()
# If you want a object to get the scores
analyzer.get_scores()
What is the score function¶
The score function is developed by the BSuite team at Deepmind. It is open source and available at https://github.com/deepmind/bsuite
The score measures behavioral aspects of the agent only, and does not take into account internal state of the agent. For more details read Section 2 of the BSuite paper. In this case we use only the "Basic" aspect of the agent's scoring system.
It is not necessary to understand the score in order to improve your agent's performance
Backend Evaluation¶
THIS CODE WILL EVALUATE THE AGENT USING THE SPECIFIED CONFIGS FOR THE CORRESPONDING ENVIRONMENTS. DO NOT EDIT THE CONTENTS OF THIS CELL.
## Do not edit this cell
if (os.environ.get('BACKEND_EVALUATOR') is not None):
import backend_evaluator
runs = {
'catch': (
backend_evaluator.CATCH,
catch_config),
'catch_noise': (
backend_evaluator.CATCH_NOISE,
catch_noise_config),
'cartpole': (
backend_evaluator.CARTPOLE,
cartpole_config),
'cartpole_noise': (
backend_evaluator.CARTPOLE_NOISE,
cartpole_noise_config),
'mountaincar': (
backend_evaluator.MOUNTAINCAR,
mountaincar_config),
'mountaincar_noise': (
backend_evaluator.MOUNTAINCAR_NOISE,
mountaincar_noise_config)
}
for run_name, run in runs.items():
env_ids, config = run
for env_id in env_ids:
runner = Runner(env_id=env_id,
agent=RLAgent(agent_config=config),
verbose=False,
eval=True)
runner.play_episodes()
Submit to AIcrowd 🚀¶
NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)
! aicrowd notebook submit --no-verify -c iitm-rl-final-project -a assets
Content
Comments
You must login before you can post a comment.