Hockey Team Classification
[Starter Notebook] RL - Taxi Problem
This a getting started notebook for the Taxi Problem in the RL course.
This is a getting starter notebook for the Taxi Problem. It contains basic instructions for using the notebook to make submissions as well as listed tasks to perform & questions to answer. Please read the instruction carefully and then proceed. You are required to create a copy of it before start playing with it.
Happy Solving!😀
What is the notebook about?¶
Problem - DP Algorithm¶
This problem deals with a taxi driver with multiple actions in different cities. The tasks you have to do are:
- Implement DP Algorithm to find the optimal sequence for the taxi driver
- Find optimal policies for sequences of varying lengths
- Explain a variation on the policy
How to use this notebook? 📝¶
- This is a shared template and any edits you make here will not be saved. You should make a copy in your own drive. Click the "File" menu (top-left), then "Save a Copy in Drive". You will be working in your copy however you like.
- Update the config parameters. You can define the common variables here
Variable | Description |
---|---|
AICROWD_DATASET_PATH |
Path to the file containing test data. This should be an absolute path. |
AICROWD_RESULTS_DIR |
Path to write the output to. |
AICROWD_ASSETS_DIR |
In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation. |
AICROWD_API_KEY |
In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me |
- Installing packages. Please use the Install packages 🗃 section to install the packages
Setup AIcrowd Utilities 🛠¶
We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block.
!pip install -U git+https://gitlab.aicrowd.com/aicrowd/aicrowd-cli.git@notebook-submission-v2 > /dev/null
%load_ext aicrowd.magic
AIcrowd Runtime Configuration 🧷¶
Define configuration parameters. Please include any files needed for the notebook to run under ASSETS_DIR
. We will copy the contents of this directory to your final submission file 🙂
import os
AICROWD_DATASET_PATH = os.getenv("DATASET_PATH", os.getcwd()+"/40746340-4151-4921-8496-be10b3f8f5cf_hw2_q1.zip")
AICROWD_RESULTS_DIR = os.getenv("OUTPUTS_DIR", "results")
API_KEY = "" #Get your API key from https://www.aicrowd.com/participants/me
Download dataset files 📲¶
!aicrowd login --api-key $API_KEY
!aicrowd dataset download -c rl-taxi
!unzip -q $AICROWD_DATASET_PATH
DATASET_DIR = 'hw2_q1/'
!mkdir {DATASET_DIR}results/
Install packages 🗃¶
Please add all pacakage installations in this section
Import packages 💻¶
import numpy as np
import os
# ADD ANY IMPORTS YOU WANT HERE
import numpy as np
class TaxiEnv_HW2:
def __init__(self, states, actions, probabilities, rewards):
self.possible_states = states
self._possible_actions = {st: ac for st, ac in zip(states, actions)}
self._ride_probabilities = {st: pr for st, pr in zip(states, probabilities)}
self._ride_rewards = {st: rw for st, rw in zip(states, rewards)}
self._verify()
def _check_state(self, state):
assert state in self.possible_states, "State %s is not a valid state" % state
def _verify(self):
"""
Verify that data conditions are met:
Number of actions matches shape of next state and actions
Every probability distribution adds up to 1
"""
ns = len(self.possible_states)
for state in self.possible_states:
ac = self._possible_actions[state]
na = len(ac)
rp = self._ride_probabilities[state]
assert np.all(rp.shape == (na, ns)), "Probabilities shape mismatch"
rr = self._ride_rewards[state]
assert np.all(rr.shape == (na, ns)), "Rewards shape mismatch"
assert np.allclose(rp.sum(axis=1), 1), "Probabilities don't add up to 1"
def possible_actions(self, state):
""" Return all possible actions from a given state """
self._check_state(state)
return self._possible_actions[state]
def ride_probabilities(self, state, action):
"""
Returns all possible ride probabilities from a state for a given action
For every action a list with the returned with values in the same order as self.possible_states
"""
actions = self.possible_actions(state)
ac_idx = actions.index(action)
return self._ride_probabilities[state][ac_idx]
def ride_rewards(self, state, action):
actions = self.possible_actions(state)
ac_idx = actions.index(action)
return self._ride_rewards[state][ac_idx]
Examples of using the environment functions¶
def check_taxienv():
# These are the values as used in the pdf, but they may be changed during submission, so do not hardcode anything
states = ['A', 'B', 'C']
actions = [['1','2','3'], ['1','2'], ['1','2','3']]
probs = [np.array([[1/2, 1/4, 1/4],
[1/16, 3/4, 3/16],
[1/4, 1/8, 5/8]]),
np.array([[1/2, 0, 1/2],
[1/16, 7/8, 1/16]]),
np.array([[1/4, 1/4, 1/2],
[1/8, 3/4, 1/8],
[3/4, 1/16, 3/16]]),]
rewards = [np.array([[10, 4, 8],
[ 8, 2, 4],
[ 4, 6, 4]]),
np.array([[14, 0, 18],
[ 8, 16, 8]]),
np.array([[10, 2, 8],
[6, 4, 2],
[4, 0, 8]]),]
env = TaxiEnv_HW2(states, actions, probs, rewards)
print("All possible states", env.possible_states)
print("All possible actions from state B", env.possible_actions('B'))
print("Ride probabilities from state A with action 2", env.ride_probabilities('A', '2'))
print("Ride rewards from state C with action 3", env.ride_rewards('C', '3'))
check_taxienv()
Task 1 - DP Algorithm implementation¶
Implement your DP algorithm that takes the starting state and sequence length and return the expected reward for the policy
def dp_solve(taxienv):
## Implement the DP algorithm for the taxienv
states = taxienv.possible_states
values = {s: 0 for s in states}
policy = {s: '0' for s in states}
all_values = [] # Append the "values" dictionary to this after each update
all_policies = [] # Append the "policy" dictionary to this after each update
# Note: The sequence length is always N=10
# ADD YOUR CODE BELOW - DO NOT EDIT ABOVE THIS LINE
# DO NOT EDIT BELOW THIS LINE
results = {"Expected Reward": all_values, "Polcies": all_policies}
return results
Here is an example of what the "results" output from value_iter function should look like¶
Ofcourse, it won't be all zeros
{'Expected Reward': [{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0},
{'A': 0, 'B': 0, 'C': 0}],
'Polcies': [{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'},
{'A': '0', 'B': '0', 'C': '0'}]}
if not os.path.exists(AICROWD_RESULTS_DIR):
os.mkdir(AICROWD_RESULTS_DIR)
# DO NOT EDIT THIS CELL, DURING EVALUATION THE DATASET DIR WILL CHANGE
input_dir = os.path.join(DATASET_DIR, 'inputs')
for params_file in os.listdir(input_dir):
kwargs = np.load(os.path.join(input_dir, params_file), allow_pickle=True).item()
env = TaxiEnv_HW2(**kwargs)
results = dp_solve(env)
idx = params_file.split('_')[-1][:-4]
np.save(os.path.join(AICROWD_RESULTS_DIR, 'results_' + idx), results)
## Modify this code to show the results for the policy and expected rewards properly
print(results)
Task 2 - Tabulate the optimal policy & optimal value for each state in each round for N=10¶
Modify this cell and add your answer
Question - Consider a policy that always forces the driver to go to the nearest taxi stand, irrespective of the state. Is it optimal? Justify your answer.¶
Modify this cell and add your answer
Submit to AIcrowd 🚀¶
NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)
!DATASET_PATH=$AICROWD_DATASET_PATH aicrowd notebook submit -c rl-taxi -a assets
Content
Comments
You must login before you can post a comment.