NeurIPS 2021: MineRL Diamond Competition
Fully scripted baseline for the Intro track
Meet Bulldozer the lumberjack
Introduction¶
This notebook is part two of the Intro track baselines for the MineRL 2021 competition.
Below you will find a fully scripted agent that has two components:
- Bulldozer the lumberjack - a script that simply digs forward with occasional jumps and random 90 degree turns.
- A script that crafts a wooden pickaxe and digs down to get some cobblestone.
Script #1 runs until a certain number of logs is achieved, then script #2 kicks in. When evaluated on MineRLObtainDiamond environment it achieves an average reward of 4.0.
In part three we will replace the script #1 from above with a machine learning model! Link below:
Setup¶
%%capture
!sudo add-apt-repository -y ppa:openjdk-r/ppa
!sudo apt-get purge openjdk-*
!sudo apt-get install openjdk-8-jdk
!sudo apt-get install xvfb xserver-xephyr vnc4server python-opengl ffmpeg
%%capture
!pip3 install --upgrade minerl
!pip3 install pyvirtualdisplay
!pip3 install -U colabgymrender
Import libraries¶
import random
import gym
import minerl
from tqdm.notebook import tqdm
from colabgymrender.recorder import Recorder
from pyvirtualdisplay import Display
import logging
logging.disable(logging.ERROR) # reduce clutter, remove if something doesn't work to see the error logs.
Start of the agent code¶
def str_to_act(env, actions):
"""
Simplifies specifying actions for the scripted part of the agent.
Some examples for a string with a single action:
'craft:planks'
'camera:[10,0]'
'attack'
'jump'
''
There should be no spaces in single actions, as we use spaces to separate actions with multiple "buttons" pressed:
'attack sprint forward'
'forward camera:[0,10]'
:param env: base MineRL environment.
:param actions: string of actions.
:return: dict action, compatible with the base MineRL environment.
"""
act = env.action_space.noop()
for action in actions.split():
if ":" in action:
k, v = action.split(':')
if k == 'camera':
act[k] = eval(v)
else:
act[k] = v
else:
act[action] = 1
return act
Actions¶
Here's a list of all possible actions:
Dict(attack:Discrete(2),
back:Discrete(2),
camera:Box(low=-180.0, high=180.0, shape=(2,)),
craft:Enum(crafting_table,none,planks,stick,torch),
equip:Enum(air,iron_axe,iron_pickaxe,none,stone_axe,stone_pickaxe,wooden_axe,wooden_pickaxe),
forward:Discrete(2),
jump:Discrete(2),
left:Discrete(2),
nearbyCraft:Enum(furnace,iron_axe,iron_pickaxe,none,stone_axe,stone_pickaxe,wooden_axe,wooden_pickaxe),
nearbySmelt:Enum(coal,iron_ingot,none),
place:Enum(cobblestone,crafting_table,dirt,furnace,none,stone,torch),
right:Discrete(2),
sneak:Discrete(2),
sprint:Discrete(2))
Camera¶
Camera actions contain two values:
- Pitch (up/down), where up is negative, down is positive.
- Yaw (left/right), where left is negative, right is positive.
For example, moving the camera up by 10 degrees would be 'camera:[-10,0]'.
Change agent behaviour here¶
To change the sequences of actions that the agent performs, change the code inside either the get_action_sequence_bulldozer()
or get_action_sequence()
function below. One action is done every tick and there are 20 ticks per second in a regular Minecraft game.
def get_action_sequence_bulldozer():
"""
Specify the action sequence for Bulldozer, the scripted lumberjack.
"""
action_sequence_bulldozer = []
action_sequence_bulldozer += [''] * 100 # wait 5 secs
action_sequence_bulldozer += ['camera:[10,0]'] * 3 # look down 30 degrees
for _ in range(100):
action_sequence_bulldozer += ['attack sprint forward'] * 100 # dig forward for 5 secs
action_sequence_bulldozer += ['jump'] # jump!
action_sequence_bulldozer += ['attack sprint forward'] * 100
action_sequence_bulldozer += ['jump']
action_sequence_bulldozer += ['attack sprint forward'] * 100
if random.random() < 0.5: # turn either 90 degrees left or 90 degrees right with an equal probability
action_sequence_bulldozer += ['camera:[0,-10]'] * 9
else:
action_sequence_bulldozer += ['camera:[0,10]'] * 9
return action_sequence_bulldozer
def get_action_sequence():
"""
Specify the action sequence for the agent to execute.
"""
# make planks, sticks, crafting table and wooden pickaxe:
action_sequence = []
action_sequence += [''] * 100
action_sequence += ['craft:planks'] * 4
action_sequence += ['craft:stick'] * 2
action_sequence += ['craft:crafting_table']
action_sequence += ['camera:[10,0]'] * 18
action_sequence += ['attack'] * 20
action_sequence += [''] * 10
action_sequence += ['jump']
action_sequence += [''] * 5
action_sequence += ['place:crafting_table']
action_sequence += [''] * 10
# bug: looking straight down at a crafting table doesn't let you craft. So we look up a bit before crafting.
action_sequence += ['camera:[-1,0]']
action_sequence += ['nearbyCraft:wooden_pickaxe']
action_sequence += ['camera:[1,0]']
action_sequence += [''] * 10
action_sequence += ['equip:wooden_pickaxe']
action_sequence += [''] * 10
# dig down:
action_sequence += ['attack'] * 600
action_sequence += [''] * 10
return action_sequence
Parameters¶
# Parameters:
TEST_EPISODES = 5 # number of episodes to test the agent for.
MAX_TEST_EPISODE_LEN = 5000 # 18k is the default for MineRLObtainDiamond.
N_WOOD_THRESHOLD = 4 # number of wood logs to get before starting script #2.
Start Minecraft¶
display = Display(visible=0, size=(400, 300))
display.start();
env = gym.make('MineRLObtainDiamond-v0')
env = Recorder(env, './video', fps=60)
Run your agent¶
As the code below runs you should see episode videos and rewards show up. You can run the below cell multiple times to see different episodes.
for episode in range(TEST_EPISODES):
obs = env.reset();
done = False
total_reward = 0
steps = 0
action_sequence_bulldozer = get_action_sequence_bulldozer()
action_sequence = get_action_sequence()
# scripted part to get some logs:
for j, action in enumerate(tqdm(action_sequence_bulldozer[:MAX_TEST_EPISODE_LEN])):
obs, reward, done, _ = env.step(str_to_act(env, action))
total_reward += reward
steps += 1
if obs['inventory']['log'] >= N_WOOD_THRESHOLD:
break
if done:
break
# scripted part to use the logs:
if not done:
for i, action in enumerate(tqdm(action_sequence[:MAX_TEST_EPISODE_LEN - j])):
obs, reward, done, _ = env.step(str_to_act(env, action))
total_reward += reward
steps += 1
if done:
break
env.release()
env.play()
print(f'Episode #{episode+1} reward: {total_reward}\t\t episode length: {steps}\n')
Content
Comments
You must login before you can post a comment.