NeurIPS 2022: CityLearn Challenge

MARLISA 2022: Adopting MARLISA for CityLearn2022

Adopting MARLISA for CityLearn2022. A GuideBook and a Baseline


This Notebook outlines how the Multiagent Reinforcement Learning Algorithm MARLISA can be modified for the 2022 CityLearn. It Gives a SOTA baseline for challengers to improve upon.

CityLearn MARLISA 2022

Author: Chia E Tungom
Email: bamtungom@protonmail.com
Date: Aug-11-2022

In this Notebook we outline how the Multiagent Reinforcement Learning Algorithm MARLISA can be modified for the 2022 CityLearn Challenge. For details on how MARLISA works, refer to the study and for the orignal implementation refer to the CityLearn git repository.

A brief of what the algorithm aims to achieve is a follows

  1. Implement a Multi-agent version of Soft-Actor Critic (SAC) algorithm
    • Design a reward function with individual and collective goals
    • Agents share information amongs each other using a leader-follower scheme
      • Each building has it's own RL agent
  2. Use the algorithm to
    • Plan and Control energy storage of a diverse set of buildings
    • Aim to reshape the aggregated curve of electricity demand
    • Decide how much heating and cooling is stored or released at any time
  3. Measure performance on the following metric
    • Annual net electricity consumption in the district
    • Average daily peak demand
    • Annual peak demand
    • Ramping

Unlike the CityLearn environment where MARLISA was implemented in, the 2022 CityLearn Challenge environment has different environmnet features and optimization objective. In this Notebook we aim to show how MARLISA can be adapted to the 2022 CityLearn Challenge. We do so by doing the following

  1. Setting Up the CityLearn 2022 Environment
  2. Getting Building information (should be MARLISA compatible)
  3. Some Data Overview
  4. Generate Building State Actions requirements
  5. Outline Modifications of MARLISA code
  6. Run MARLISA2022

Note: I only aim to make it easier for those who want to get started with MARL(ISA)

To Run this Notebook, you will need to have the started git repo with the following folder

  • Get the MARLISA python file from the intelligent-environments-lab git hub repo and place in the agents folder
  • Get the common folder from the intelligent-environments-lab and place in the root folder of ur starter git folder
  • Place this Notebook in the root folder as well

This Notebook was made while listening to MONALISA. Enjoy!!!

Lets Gooooooooooo!!!

1. Load Environment

We beginning by setting Up our RL environment for CityLearn2022. Some of the code in this section was taken from the local evaluation python file in the starter kit repo

In [2]:
# from citylearn import CityLearn
# !pip3 install sklearn
import matplotlib.pyplot as plt
from pathlib import Path
from agents.marlisa import MARLISA
import numpy as np
from tqdm import tqdm
import time
from citylearn.citylearn import CityLearnEnv

# Custom configure enviroment 
class Constants:
    episodes = 3
    schema_path = './data/citylearn_challenge_2022_phase_1/schema.json'

def action_space_to_dict(aspace):
    """ Only for box space """
    return { "high": aspace.high,
             "low": aspace.low,
             "shape": aspace.shape,
             "dtype": str(aspace.dtype)

def env_reset(env):
    observations = env.reset()
    action_space = env.action_space
    observation_space = env.observation_space
    building_info = env.get_building_information()
    building_info = list(building_info.values())
    action_space_dicts = [action_space_to_dict(asp) for asp in action_space]
    observation_space_dicts = [action_space_to_dict(osp) for osp in observation_space]
    obs_dict = {"action_space": action_space_dicts,
                "observation_space": observation_space_dicts,
                "building_info": building_info,
                "observation": observations }
    return obs_dict

env = CityLearnEnv(schema=Constants.schema_path)
/Users/chemago/opt/anaconda3/envs/CityLearn2022/lib/python3.7/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

2. Get Building Information

Building Information is needed as input in the MARLISA algorithm. Building information contains information like

  • A bildings solar power storage capacity
  • A buildings annual demand for cooling, heating and dhw (deep hot water)
  • And other information

To get building information, you can run the command env.get_building_information(). The information we get does not contain keys like Building_1 which as needed but rather some sequence of random letters and numbers. We make modifications to rename the buildings in the order they appear (I am not sure this renaming is proper and if you have an idea please leave a comment below).

The Buildings are renamed the same way buildings are named in the CityLearn environment.

In [3]:
observations_spaces, actions_spaces = env.observation_space, env.action_space

building_info = env.get_building_information()

def RenameBuilding(Building_Info):
    newB = {}
    for i, building in enumerate(Building_Info):
        newB["Building_"+str(i+1)] = building_info[building]
    return newB

building_info = RenameBuilding(building_info)
{'solar_power': 4.0,
 'annual_dhw_demand': 0.0,
 'annual_cooling_demand': 0.0,
 'annual_heating_demand': 0.0,
 'annual_nonshiftable_electrical_demand': 0.001,
 'correlations_dhw': {'4894dda91227442ba7dc5b64b69066d7': nan,
  '2511617f742645d39774d23b13661e32': nan,
  'c5c1b860e97f49d1a98eb6d118a8975a': nan,
  'da29bf3bc8d1439da75734ee3566a522': nan},
 'correlations_cooling_demand': {'4894dda91227442ba7dc5b64b69066d7': nan,
  '2511617f742645d39774d23b13661e32': nan,
  'c5c1b860e97f49d1a98eb6d118a8975a': nan,
  'da29bf3bc8d1439da75734ee3566a522': nan},
 'correlations_heating_demand': {'4894dda91227442ba7dc5b64b69066d7': nan,
  '2511617f742645d39774d23b13661e32': nan,
  'c5c1b860e97f49d1a98eb6d118a8975a': nan,
  'da29bf3bc8d1439da75734ee3566a522': nan},
 'correlations_non_shiftable_load': {'4894dda91227442ba7dc5b64b69066d7': 0.196,
  '2511617f742645d39774d23b13661e32': 0.192,
  'c5c1b860e97f49d1a98eb6d118a8975a': 0.166,
  'da29bf3bc8d1439da75734ee3566a522': 0.25}}

3. A Look at Building Data

Taking a Look at Building Data we see that the observations

  • Indoor Temperature [C], Average Unmet Cooling Setpoint Difference [C] and Indoor Relative Humidity [%]: All have NaN values
  • DHW Heating [kWh], Cooling Load [kWh] and Heating Load [kWh]: All have values 0

If this is the case when sampled in the environmet, we don't need this variables for learning a model as they don't provide any information about a building or state.

This is useful because inside MARLISA we can set which observations to be removed if need be

In [9]:
import pandas as pd
import plotly.express as px

data = pd.read_csv('data/citylearn_challenge_2022_phase_1/Building_1.csv')
Missing = data.isnull().sum()

long_df = px.data.medals_long()
fig = px.bar(x = Missing.index, y = Missing.values,
             color = Missing.index, title = "Count of Missing Data")
In [6]:
Month Hour Day Type Daylight Savings Status Indoor Temperature [C] Average Unmet Cooling Setpoint Difference [C] Indoor Relative Humidity [%] Equipment Electric Power [kWh] DHW Heating [kWh] Cooling Load [kWh] Heating Load [kWh] Solar Generation [W/kW]
count 8760.000000 8760.000000 8760.000000 8760.0 0.0 0.0 0.0 8760.000000 8760.0 8760.0 8760.0 8760.000000
mean 6.526027 12.500000 3.992466 0.0 NaN NaN NaN 1.208145 0.0 0.0 0.0 205.836089
std 3.448048 6.922582 2.003522 0.0 NaN NaN NaN 0.968270 0.0 0.0 0.0 290.977786
min 1.000000 1.000000 1.000000 0.0 NaN NaN NaN 0.057000 0.0 0.0 0.0 0.000000
25% 4.000000 6.750000 2.000000 0.0 NaN NaN NaN 0.570167 0.0 0.0 0.0 0.000000
50% 7.000000 12.500000 4.000000 0.0 NaN NaN NaN 0.812079 0.0 0.0 0.0 0.000000
75% 10.000000 18.250000 6.000000 0.0 NaN NaN NaN 1.530529 0.0 0.0 0.0 412.108333
max 12.000000 24.000000 7.000000 0.0 NaN NaN NaN 7.987483 0.0 0.0 0.0 976.250000

4. Generate Building States Action Requirements

This is an input to the MARLISA algorithm and is used to determine which states or actions are needed for each building. Unlike in the 2020 MARLISA where a buildings requirements file is nicely provided, the information in the 2022 challenge is in the schema.json file. The Requirements for each building state and action is provided (I assume it is same for all buildings) but we need to modify it for our inpute to suit the algorithm.

In the file a state with value True means it's required for the given building and an action with value True means that action needs to be taken for that building and False otherwise.

We see that only action electrial storage is needed.

NOTE: The order of the states is same as the order of the observations i.e if month is first then in observation it's in index 0. e.g Hour comes third which mean it's found in index 2:

After generating the file, you can save it in Json format for usage as input or give to your algorithm in the dictionary format. Here we use the later. If you use the later, comment line [37-38] where the file is read in.

In [11]:
import json

def GenStateActionFromJson(JsonPath, BuildingCount = 5):

    with open(JsonPath) as json_file:
        buildings_states_actions = json.load(json_file)

    States = buildings_states_actions['observations']
    Actions = buildings_states_actions['actions']

    StateINFo = {}
    ActionINFo = {}
    INFos = {}
    for var, ins in States.items():
        #print(var, " <><> ", ins)
        if ins['active']:
            StateINFo[var] = ins['active']
    for act, ins  in Actions.items():
        if ins['active']:
            ActionINFo[act] = ins['active']

    INFos["states"] = StateINFo
    INFos["action"] = ActionINFo

    return {"Building_" + str(key): INFos for key in range(1,BuildingCount+1)}

JsonFile = 'data/citylearn_challenge_2022_phase_1/schema.json'
BuildingsStatesActions = GenStateActionFromJson(JsonFile, BuildingCount = 5)

{'month': True,
 'day_type': True,
 'hour': True,
 'outdoor_dry_bulb_temperature': True,
 'outdoor_dry_bulb_temperature_predicted_6h': True,
 'outdoor_dry_bulb_temperature_predicted_12h': True,
 'outdoor_dry_bulb_temperature_predicted_24h': True,
 'outdoor_relative_humidity': True,
 'outdoor_relative_humidity_predicted_6h': True,
 'outdoor_relative_humidity_predicted_12h': True,
 'outdoor_relative_humidity_predicted_24h': True,
 'diffuse_solar_irradiance': True,
 'diffuse_solar_irradiance_predicted_6h': True,
 'diffuse_solar_irradiance_predicted_12h': True,
 'diffuse_solar_irradiance_predicted_24h': True,
 'direct_solar_irradiance': True,
 'direct_solar_irradiance_predicted_6h': True,
 'direct_solar_irradiance_predicted_12h': True,
 'direct_solar_irradiance_predicted_24h': True,
 'carbon_intensity': True,
 'non_shiftable_load': True,
 'solar_generation': True,
 'electrical_storage_soc': True,
 'net_electricity_consumption': True,
 'electricity_pricing': True,
 'electricity_pricing_predicted_6h': True,
 'electricity_pricing_predicted_12h': True,
 'electricity_pricing_predicted_24h': True}

5. Internal MARLISA Modification

MARLISA CODE Tips for modification for 2022 challenge

5.1. Initialization (init body)

[Line 58 to 65] abstract information to individual buildings

[Line 75-83] Energy size coefficient for every building (Not Needed in 2022)

[Line 86-111] Define Encoder: Set Regression Learner for every building, define Encoding for every observation (think of it as data column) and set target variable to be removed MODIFY ACCORDING TO THE CURRENT DATASET

[Line 131-145] Define Regression Encoder: for transforming states in regression model MODIFY ACCORDING TO THE CURRENT DATASET

[Line 149-164] Solar Capacity (remove variables if no solar PV): removes solar radiation related variables for houses without PV COMMENT THIS SECTION

5.2. Select Action (MARLISA method or function)

  • Takes as inputs states and deterministic
    • states: the states of the buildings
    • deterministic: boolean can be true or false

**[Line 222-226] Initialize coordination variables: coordination variables are two dimesional for every building. MODIFY ACCORDING TO THE CURRENT DATASET (in 2022 we are asked to optimize for electricty cost and carbon emission)

  • Capacity Dispatched: This is the toal amont of electricity already dispatched. it is related to the energy size coefficient at every time step
  • **Electrical Demand: Related to the total electricty demand estimated by a prediction algorithm. Different in Information Sharing and Non Info Sharing Cases

AFTER the recommended modifications you can make a trial run as shown next

The REWARD FUNCTION can also be modified


To run Marlisa, we need to give it parameters that influence it's learning

The MARLISA Algorithm takes inputs that we can be classified into two categories

6.1. Environment Parameters

These are parameters specific to the reinforcement learning environment (CityLearn Version). They give information about the simulation envrionment that will be used. details about these environmental variables are explored above, to understanding the 2022 citylearn environment, check this notebook. A summary explanation can be laid down as follows.

  • 'building_ids': These are the buildings identities in the environment written in the form "Building_id" where id is a building number a building_id can look as follows
    • ["Building_1", "Building_2", ... , "Building_n"]
  • 'buildings_states_actions': This is a json file defining the different states and actions possible for a building e.g If a building has a
    • states {day : False, temp: True} it means there will be information for temp but not for day
    • "actions": {"cooling_storage": true, "dhw_storage": true, "electrical_storage": false}. this means there will be no action required or electric storage is absent in the building
  • 'building_info': Gives valuable information about a building like
    • solar_power_capacity (kW)
    • Annual_DHW_demand (kWh)
    • Annual_cooling_demand (kWh)
    • Annual_nonshiftable_electrical_demand (kWh)
    • etc
  • 'observation_spaces': This is information about the observation space of every building in the environment.
    • It contains n arrays where n is the number of buildings
    • Each array contains the lower and upper bound for the building observation along with it's dimension and datatype
  • 'action_spaces': This is information about the actions_spaces of every building in the environment
    • It contains n arrays where n is the number of buildings
    • Each array contains the lower and upper bound for the building action along with it's dimension and datatype

6.2. Algorithm Parameters

These are parameters specific to our reinforcement learning algorithm. Details about these parameters can be found in the paper. The settings below are as provided in the original MARLISA implementation found here

  • hidden_dim:[256,256],
  • discount:0.99,
  • tau:5e-3,
  • lr:3e-4,
  • batch_size:256,
  • replay_buffer_capacity:1e5,
  • regression_buffer_capacity:3e4,
  • start_training:600, # Start updating actor-critic networks
  • exploration_period:7500, # Just taking random actions
  • start_regression:500, # Start training the regression model
  • information_sharing:True, # If True -> set the appropriate 'reward_function_ma' in reward_function.py
  • pca_compression:.95,
  • action_scaling_coef:0.5, # Actions are multiplied by this factor to prevent too aggressive actions
  • reward_scaling:5., # Rewards are normalized and multiplied by this factor
  • update_per_step:2, # How many times the actor-critic networks are updated every hourly time-step
  • iterations_as:2,# Iterations of the iterative action selection (see MARLISA paper for more info)
  • safe_exploration:True

NOTE: Set the appropriate number of buildings as the environment. Ensure the BuildingsStatesActions generated have same number of buildings as the given environment

In [25]:
params_agent = {'building_ids':["Building_"+str(i) for i in [1,2,3,4,5]],
                 'start_training':600, # Start updating actor-critic networks
                 'exploration_period':7500, # Just taking random actions
                 'start_regression':500, # Start training the regression model
                 'information_sharing':True, # If True -> set the appropriate 'reward_function_ma' in reward_function.py
                 'action_scaling_coef':0.5, # Actions are multiplied by this factor to prevent too aggressive actions
                 'reward_scaling':5., # Rewards are normalized and multiplied by this factor
                 'update_per_step':2, # How many times the actor-critic networks are updated every hourly time-step
                 'iterations_as':2,# Iterations of the iterative action selection (see MARLISA paper for more info)

# Instantiating the control agent(s)
agents = MARLISA(**params_agent)

RUNTIME = 30*24
shortcut = True
cutshort = True
# We will use 1 episode if we intend to simulate a real-time RL controller (like in the CityLearn Challenge)
# In climate zone 5, 1 episode contains 5 years of data, or 8760*5 time-steps.
n_episodes = 1
start = time.time()
for e in tqdm(range(n_episodes)): 
    state = env.reset()
    done = False
    j = 0
    is_evaluating = False
    action, coordination_vars = agents.select_action(state, deterministic=is_evaluating)

    while (not done) and shortcut:
        next_state, reward, done, _ = env.step(action)
        action_next, coordination_vars_next = agents.select_action(next_state, deterministic=is_evaluating)
        agents.add_to_buffer(state, action, reward, next_state, done, coordination_vars, coordination_vars_next)
        coordination_vars = coordination_vars_next
        state = next_state
        action = action_next

        if j%(24*30) == 0:
            print(f' We are now in step <><><> {j}, with score >> {env.evaluate()}')
        # if cutshort:
        #     is_evaluating = (j > (3*RUNTIME)/2)
        #     j += 1
        is_evaluating = (j > 3*8760)
        j += 1
        # if j >= RUNTIME:
        #     shortcut = False
    print('Loss -',env.evaluate(), 'Simulation time (min) -',(time.time()-start)/60.0)
    # CPU training for 603mins
  0%|          | 0/1 [00:00<?, ?it/s]
 We are now in step <><><> 0, with score >> (1.0855277538865769, 1.0811491864057319)
 We are now in step <><><> 720, with score >> (0.9868897065900977, 1.0260397598112843)
 We are now in step <><><> 1440, with score >> (0.984210580183218, 1.031761640611385)
 We are now in step <><><> 2160, with score >> (0.981169706402617, 1.0367885069305076)
 We are now in step <><><> 2880, with score >> (0.9766602408660171, 1.0377534180708696)
 We are now in step <><><> 3600, with score >> (0.9735084242276653, 1.0331572967458338)
 We are now in step <><><> 4320, with score >> (0.9703696538312334, 1.0285608652459208)
 We are now in step <><><> 5040, with score >> (0.9693617695491854, 1.0309809631859472)
 We are now in step <><><> 5760, with score >> (0.9715486226832148, 1.0366199298077803)
 We are now in step <><><> 6480, with score >> (0.9761026935782113, 1.0433876572747864)
 We are now in step <><><> 7200, with score >> (0.9790887221095199, 1.0466372048291206)
 We are now in step <><><> 7920, with score >> (0.9816487792165054, 1.0439024707244202)
 We are now in step <><><> 8640, with score >> (0.9833386774997737, 1.0382517776113684)
100%|██████████| 1/1 [43:58<00:00, 2638.14s/it]
Loss - (0.9836475699043856, 1.037550910926995) Simulation time (min) - 43.96900788545609


Leave a comment for any questions, doubts or corrections


About 1 year ago

Have you been abloe to suybmit your marlisa solution?

I had to hardcode building-info and other piecies opf data as it looks like the wrapper is not receiving them in the online evaluation

About 1 year ago

Hey @felipe thanks for pointing that out. I haven’t submitted my MARLISA solution yet.

I did the same hard coding with the MARLISA in my local machine as the schema does not have proper building names. I will have to keep that in mind when I make a submission.

Was ur MARLISA submission successfully evaluated?

About 1 year ago

Yes but I had to hardcode everything because for some reason builfing_info and some environment parameters are not bein passed to the model in the on-;lline evaluation.

About 1 year ago

Comment deleted by felipe_b.

About 1 year ago

See the issue here https://discourse.aicrowd.com/t/multi-agent-coordinator-and-orderenforcingwrapper/7978/19?u=felipe_b .

building_info and observation_space are not being passed online. They are not passed in the dictionary

About 1 year ago

Thanks @felipe_b for the heads up. I’ll keep that in mind before submission. I mainly focused on an evolutionary algorithm in phase 1.

About 1 year ago

You are welcome. Are you open to teaming up?

About 1 year ago

Sure why not. Let’s gooo!!!

About 1 year ago

Great do you have a discord? or some other way of connecting? mine is Felipe Bivort Haiek#7325

You must login before you can post a comment.