Activity
Ratings Progression
Challenge Categories
Challenges Entered
Learning From Human-Feedback
Latest submissions
Specialize and Bargain in Brave New Worlds
Latest submissions
See Allgraded | 197092 | ||
graded | 196866 | ||
graded | 196863 |
ASCII-rendered single-player dungeon crawl game
Latest submissions
See Allgraded | 149038 | ||
graded | 148738 | ||
graded | 147238 |
Latest submissions
Sample Efficient Reinforcement Learning in Minecraft
Latest submissions
Measure sample efficiency and generalization in reinforcement learning using procedurally generated environments
Latest submissions
See Allgraded | 93478 | ||
graded | 93477 | ||
graded | 93390 |
Multi-Agent Reinforcement Learning on Trains
Latest submissions
Sample-efficient reinforcement learning in Minecraft
Latest submissions
See Allgraded | 120617 | ||
graded | 120492 | ||
failed | 120483 |
Sample-efficient reinforcement learning in Minecraft
Latest submissions
See Allgraded | 25413 | ||
graded | 25412 | ||
graded | 25075 |
Multi Agent Reinforcement Learning on Trains.
Latest submissions
A new benchmark for Artificial Intelligence (AI) research in Reinforcement Learning
Latest submissions
See Allgraded | 8563 | ||
graded | 8534 | ||
failed | 8533 |
Predict if users will skip or listen to the music they're streamed
Latest submissions
Multi-Agent Reinforcement Learning on Trains
Latest submissions
Participant | Rating |
---|
Participant | Rating |
---|
-
BeepBoop NeurIPS 2020: MineRL CompetitionView
-
Chaotic-Dwarven-GPT-5 NeurIPS 2021 - The NetHack ChallengeView
NeurIPS 2021: MineRL Diamond Competition
MineRL self._actions
About 3 years agoThe docstring of the class ActionShaping() should be enough to figure out how to adjust the actions for the RL part of the algo. What changes do you want to make and what have you tried?
Maybe playing Minecraft for a bit or watching a youtube guide would help with Minecraft knowledge?
Questions about the environment that can be used to train the model
About 3 years agoyes, you can use the *DenseVectorObf environments in the Research track of the competition.
NeurIPS 2020: MineRL Competition
Obfuscated actions + KMeans analysis
Over 3 years agoHereβs some analysis our team did on the whole obfuscated action + KMeans thing:
A teaser: sometimes the agents donβt have a single action to look up. So shy
Error using gym.make
Over 4 years agoWorking Colab example (credit to @tviskaron):
!java -version
!sudo apt-get purge openjdk-*
!java -version
!sudo apt-get install openjdk-8-jdk
!pip3 install --upgrade minerl
!sudo apt-get install xvfb xserver-xephyr vnc4server
!sudo pip install pyvirtualdisplay
from pyvirtualdisplay import Display
display = Display(visible=0, size=(640, 480))
display.start()import minerl
import gym
env = gym.make(βMineRLNavigateDense-v0β)obs = env.reset()
done = False
net_reward = 0for _ in range(100):
action = env.action_space.noop()action['camera'] = [0, 0.03*obs["compassAngle"]] action['back'] = 0 action['forward'] = 1 action['jump'] = 1 action['attack'] = 1 obs, reward, done, info = env.step( action) net_reward += reward print("Total reward: ", net_reward)
env.close()
NeurIPS 2020: Procgen Competition
How to find subtle implementation details
Almost 4 years agoIt could be the weight initialization, as pytorch uses he_uniform by default and tensorflow uses glorot_uniform. Using tensorflow with glorot_uniform I get 42 score on starpilot, while using tensorflow with he_uniform I get 19.
Round 2 is open for submissions π
About 4 years agoSounds good, thanks @shivam . Could you please also give us the normalization factors for the 4 private envs (Rmin, Rmax) ?
Round 2 is open for submissions π
About 4 years agoWill we be able to choose which submission to use for the final 16+4 evaluation? It might be the case that our best solution that was tested locally on 16 envs is not the same as the best one for the 6+4 envs on public LB.
Human score
About 4 years agoSo I was a little bored and decided to see how well I could play the procgen games myself.
Setup:
python -m procgen.interactive --distribution-mode easy --vision agent --env-name coinrun
First I tried each game for 5-10 episodes to figure out what the keys do, how the game works, etc.
Then I played each game 100 times and logged the rewards. Here are the results:
Environment | Mean reward | Mean normalized reward |
---|---|---|
bigfish | 29.40 | 0.728 |
bossfight | 10.15 | 0.772 |
caveflyer | 11.69 | 0.964 |
chaser | 11.23 | 0.859 |
climber | 12.34 | 0.975 |
coinrun | 9.80 | 0.960 |
dodgeball | 18.36 | 0.963 |
fruitbot | 25.15 | 0.786 |
heist | 10.00 | 1.000 |
jumper | 9.20 | 0.911 |
leaper | 9.90 | 0.988 |
maze | 10.00 | 1.000 |
miner | 12.27 | 0.937 |
ninja | 8.60 | 0.785 |
plunder | 29.46 | 0.979 |
starpilot | 33.15 | 0.498 |
The mean normalized score over all games was 0.882. It stayed relatively constant throughout the 100 episodes, i.e. I didnβt improve much while playing.
Iβm not sure how useful this result would be as a βhuman benchmarkβ though - I could easily achieve ~1.000 score given enough time to think on each frame. Also, human visual reaction time is ~250ms, which at 15 fps would translate to us being at least 4 frames behind on our actions, which can be important for games like starpilot, chaser and some others.
How to save rollout video / render?
Over 4 years agoDoes it work properly for everyone else? When I run it for 100 episodes it only saves episodes number 0, 1, 8, 27, 64.
Same marks on the testing video
Over 4 years agoItβs the paint_vel_info
flag that you can find under env_config
in the .yaml files. There are also some flags that are not in the .yaml files, but people are using (use_monochrome_assets
, use_backgrounds
). You can find all of them if you scroll down here: https://github.com/openai/procgen .
Should we actually be allowed to change the environment? Maybe these settings should be reset when doing evaluation?
Unity Obstacle Tower Challenge
Submissions are stuck
Over 5 years agoThere was a mention about the final standings for round 2 being based on more seeds than 5 to get a proper average performance. Is that going to happen? I didnβt try to repeatedly submit similar models to overfit the 5 seeds for that reason.
Is there any due date of GCP credit?
Over 5 years agomine says it expires 28 May 2020, not sure if thatβs a set date or depends on when you redeem. I canβt find the date of when I redeemed.
What reward receives the agent for collecting a key?
Over 5 years ago0.1, same as a single door (thereβs 2 doors in each doorway).
Announcement: Debug your submissions
Over 5 years agoAnd I was thinking Iβm going mad when my previously working submission suddenly broke after βdisablingβ debug
Submission Failed: Evaluation Error
Over 5 years agoCanβt wait! Iβve been trying to get my dopamine trained agent to be scored (only 5-7 floors so far), but the only response I get after every change is
The following containers terminated prematurely. : agent
and itβs not very helpful. It builds fine, but gets stuck on evaluation phase.
Human Performance
Over 5 years agoIn the Obstacle Tower paper there is a section on human performance. 15 people tried it multiple times and the max floor was 22. Am I reading this right? I finished all 25 floors on my very first try without much trouble.
How far did everyone else get and how many runs did you do? We could try collecting more data and make a more accurate human benchmark this way.
Notebooks
-
Behavioural cloning baseline for the Research track Research track baselinekarolisramΒ· Over 3 years ago
-
Behavioural cloning baseline for the Intro track BC lumberjack plus scriptkarolisramΒ· Over 3 years ago
-
Fully scripted baseline for the Intro track Meet Bulldozer the lumberjackkarolisramΒ· Over 3 years ago
-
Testing MineRL environment Test the environment by running a fixed sequence of actions in a fixed worldkarolisramΒ· Over 3 years ago
MineRL self._actions
About 3 years agoAh I see the issue now. I think the confusion comes from line 121 in
RL_plus_script.py
:[('forward', 1), ('jump', 1)]
This line doesnβt mean two actions, forward on first tick and then jump on the next tick. Instead it means that the forward and jump keys are both pressed for a single tick.
You can see that by printing out
act = env.action_space.noop()
:This is a single action that does nothing, because none of the keys are pressed. If you then do:
act will become an action with those two buttons pressed. This is what the
ActionShaping()
wrapper does. To create meta actions that perform 5 attacks and such you will need to do something else. Maybe frame skipping would be an easier way to achieve that?