BABY STEPS - Getting Started¶
Author: Chia E Tungom
Email: bamtungom@protonmail.com
This Notebook demonstrates the basic facets of the CityLearn Environment. You can play with it to get familiar with the environment. Important aspects of the environment that covered include include:
Observation Space (dataset)
Action Space (discrete or continous)
Model (Policy)
Action (steps)
Evaluation (reward)
We use general purpose functions common to most RL environments for illustration.
Note: To run this notebook, place it in the root directory of your CityLearn Phase one repository (same directory as requirements.txt)
Lets Goooooo!!!
import numpy as np
import time
"""
Please do not make changes to this file.
This is only a reference script provided to allow you
to do local evaluation. The evaluator **DOES NOT**
use this script for orchestrating the evaluations.
"""
# to avoid crashes but might cause results to be different
# https://github.com/dmlc/xgboost/issues/1715
# import os
# os.environ['KMP_DUPLICATE_LIB_OK']='True'
from agents.orderenforcingwrapper import OrderEnforcingAgent
from citylearn.citylearn import CityLearnEnv
# Custom configure enviroment
class Constants:
episodes = 3
schema_path = './data/citylearn_challenge_2022_phase_1/schema.json'
def action_space_to_dict(aspace):
""" Only for box space """
return { "high": aspace.high,
"low": aspace.low,
"shape": aspace.shape,
"dtype": str(aspace.dtype)
}
def env_reset(env):
observations = env.reset()
action_space = env.action_space
observation_space = env.observation_space
building_info = env.get_building_information()
building_info = list(building_info.values())
action_space_dicts = [action_space_to_dict(asp) for asp in action_space]
observation_space_dicts = [action_space_to_dict(osp) for osp in observation_space]
obs_dict = {"action_space": action_space_dicts,
"observation_space": observation_space_dicts,
"building_info": building_info,
"observation": observations }
return obs_dict
1. Define Environment¶
The first thing we need to do is create a CityLearn environment. The environment is defined using a json schema and dataset which can be found in the data directory.
# Understand CityLearn Environment
env = CityLearnEnv(schema=Constants.schema_path)
2. OBSERVATION SPACE¶
The observation space is the data of the environment. This is what the agent sees inorder to decide which action to take.
Based on our environment the observation space is 5 dimensional corresponding to the number of buildings. Each building has it's own observation which is a 28 dimension 1D array. The 1D array stands for an observation at one point in time. Therefore our environment is a 5x28
array
- Use
env.observation_space
to explore the entire environment - Use
env.observation_space[index]
to esplore the envrionment of a particular building (index 0 for building 1)
# There is an action space for every building
# print(f' OBSERVATION SPACES {env.observation_space}')
# print(f' OBSERVATION SPACE for Builiding ONE is {env.observation_space[0]}')
# sample some actions
for building in range(5):
print(f' SAMPLE OBSERVATION SPACE for Builiding ONE >>> {len(env.observation_space[building].sample()), env.observation_space[building].sample()}')
# we can see the observations are a 28 1D numpy array with every dimension defined by the range given in the spaces BOX
3. ACTION SPACE¶
This shows us the type of actions we can take along with the dimension and property (discrete of contineous) of each actions. In the citylearn challenge, the actions are continous and one dimensional in the range [-1,1] for each building. 1 means charging and -1 means discharging.
- Based on our environment, the action space is a 5 dimensional array with each array corresponding to the action space of a building.
- one array is of the form
[(-1,1), (1,), float32]
which correspond to[(lower bound, upper bound), (dimension,), datatype]
- lower bound is the lowest or smallest value of an action while upper bound is the highest.
- Dimension stands for of our action which here is 1 (use
action_space.sample()
to see an action) - Datatype is the data type of our action which here is float
The cell below illustrates the action space(s). Play with it for understanding the actions.
action_space.sample
produces a random actions
Note: You must pick an action space of a given building inorder to sample (use index e.g action_space[0]
)
# There is an action space for every building
print(f' ACTION SPACES {env.action_space}')
print(f' ACTION SPACE for Builiding ONE is {env.action_space[0]}')
# sample some actions
for action in range(5):
print(f' SAMPLE ACTION SPACE for Builiding ONE >>> {env.action_space[1].sample()}')
# we can observe the actions are continous in the range [-1,1]
4. Define A Model or Agent¶
The agent is the Policy which decides what action to take given an observation. We can use Rule based actions(agents). The CityLearn setting is built for multiagent systems but a single agent can aslo be used.
Here we just show how to load an agent
from citylearn.agents.sac import SAC
# SAC??
5. TAKING AN ACTION¶
As already explained with the action spaces, $n$ buildings will have $n$ actions with each action corresponding to one building. Therefore our actions should appear as follows
- Action should be a List containing tuples(number of buildings). inside the tuple is a list conatining the action corresponding to the action to be taken for a given building
- Example for a five buildings environment, we could have.
Actions = [ ([0.0]), ([0.0]), ([0.0]), ([0.0]), ([0.0]) ]
A list of list is also acceptable
Actions = [ [0.0], [0.0], [0.0], [0.0], [0.0] ]
We take an action when we want to move one step ahead. We can do this using env.step(action)
When we take an action the output contains a tuple with the following:
- Next State
- Reward
- If the state is a Terminal State
- Information about the environment
# print(env_reset(env)["action_space"])
# env_reset(env)["observation_space"]
# env.reset()[0]
import random
Actions = [([random.uniform(-1,1)]) for _ in range(5)]
print(f' WE are about to take {Actions} \n')
next_state, reward, terminal, info = env.step(Actions)
print(f' NEXT STATE \n {next_state} \n')
print(f' REWARDS {reward} \n')
print(f' TERMINAL OR NOT >> {terminal} \n')
print(f' INFO {info}')
# obs_dict = env_reset(env)
# agent = OrderEnforcingAgent()
# print(agent.register_reset(obs_dict))
# env.step(agent.register_reset(obs_dict))
6. Evaluating Actions¶
After Taking actions we can evaluate the performance of our agent or agents.
Evalution is done using the final metric which is the price cost and Emission cost
env.evaluate()
SAMPLE RUN or LOCAL EVALUATION¶
Some modification have been made from the origial code. For isinstance
- We can run a test for a month i.e $30*24$ to quickly evaluate our agent
we add the following code in the evaluation section
# Skipping to shorten training time
days = 30*5
training_steps = 24*days
skipping = False
import numpy as np
import time
"""
Please do not make changes to this file.
This is only a reference script provided to allow you
to do local evaluation. The evaluator **DOES NOT**
use this script for orchestrating the evaluations.
"""
from agents.orderenforcingwrapper import OrderEnforcingAgent
from citylearn.citylearn import CityLearnEnv
class Constants:
episodes = 5
schema_path = './data/citylearn_challenge_2022_phase_1/schema.json'
def action_space_to_dict(aspace):
""" Only for box space """
return { "high": aspace.high,
"low": aspace.low,
"shape": aspace.shape,
"dtype": str(aspace.dtype)
}
def env_reset(env):
observations = env.reset()
action_space = env.action_space
observation_space = env.observation_space
building_info = env.get_building_information()
building_info = list(building_info.values())
action_space_dicts = [action_space_to_dict(asp) for asp in action_space]
observation_space_dicts = [action_space_to_dict(osp) for osp in observation_space]
obs_dict = {"action_space": action_space_dicts,
"observation_space": observation_space_dicts,
"building_info": building_info,
"observation": observations }
return obs_dict
def evaluate():
print("Starting local evaluation")
env = CityLearnEnv(schema=Constants.schema_path)
agent = OrderEnforcingAgent()
obs_dict = env_reset(env)
agent_time_elapsed = 0
step_start = time.perf_counter()
actions = agent.register_reset(obs_dict)
agent_time_elapsed += time.perf_counter()- step_start
episodes_completed = 0
num_steps = 0
interrupted = False
episode_metrics = []
# Skipping to shorten training time
days = 30*5
training_steps = 24*days
skipping = False
try:
while True:
### This is only a reference script provided to allow you
### to do local evaluation. The evaluator **DOES NOT**
### use this script for orchestrating the evaluations.
observations, _, done, _ = env.step(actions)
if done or skipping:
episodes_completed += 1
metrics_t = env.evaluate()
metrics = {"price_cost": metrics_t[0], "emmision_cost": metrics_t[1]}
if np.any(np.isnan(metrics_t)):
raise ValueError("Episode metrics are nan, please contant organizers")
episode_metrics.append(metrics)
print(f"Episode complete: {episodes_completed} | Latest episode metrics: {metrics}", )
obs_dict = env_reset(env)
step_start = time.perf_counter()
actions = agent.register_reset(obs_dict)
agent_time_elapsed += time.perf_counter()- step_start
else:
step_start = time.perf_counter()
actions = agent.compute_action(observations)
agent_time_elapsed += time.perf_counter()- step_start
num_steps += 1
if num_steps % 1000 == 0:
print(f"Num Steps: {num_steps}, Num episodes: {episodes_completed}")
### End training in set time
if num_steps % training_steps == 0:
print(f"Num Steps: {num_steps}, Num episodes: {episodes_completed}")
if num_steps == training_steps:
print(f'ENDING TRAINING AFTER {training_steps} STEPS')
skipping = True
if episodes_completed >= Constants.episodes:
break
except KeyboardInterrupt:
print("========================= Stopping Evaluation =========================")
interrupted = True
if not interrupted:
print("=========================Completed=========================")
if len(episode_metrics) > 0:
print("Average Price Cost:", np.mean([e['price_cost'] for e in episode_metrics]))
print("Average Emmision Cost:", np.mean([e['emmision_cost'] for e in episode_metrics]))
print(f"Total time taken by agent: {agent_time_elapsed}s")
if __name__ == '__main__':
evaluate()
setting Up Environment requiremnents.txt and yml files¶
follow the links https://stackoverflow.com/questions/48787250/set-up-virtualenv-using-a-requirements-txt-generated-by-conda
Content
Comments
You must login before you can post a comment.