Adding Simulation Mechanics
This week I continued to work on my soccer team simulation. I focused on improving the mechanics of the game – adding an energy level tracker for each player, and adding in the ability to steal the ball from the opposing team. I also set up training on AWS.
Previously, I encouraged the players to conserve energy by adding a slight negative reward for fast velocities. I realized this simplistic version wasn’t an accurate enough model. I now track the energy level for each agent. When the player runs faster or dribbles the ball, the energy levels decrease. When the player rests, the energy gradually restores. Tired players have a decreased maximum speed, and are more likely to lose the ball to an opponent.
This new system allows players to make a burst of exertion, without penalty or discouragement. However, it forces them to ration this activity, and will hopefully push the players to pass more often, removing the need to add a specific reward for passing.
Additionally, I added in the ability for a player to “tackle” a nearby player. Previously, a player could only take the ball if they collided directly with the ball. Now, even if the player only collides with the opposing player, they can steal the ball with some probability. (Currently, the goalie is successful in stealing the ball 95% of the time, and other players 60% of the time.) I give the players a small reward for tackling an opponent.
Initially, I set the tackling reward too high, and the players would form a red-blue pair and run down the field tackling each other over and over. After reducing the reward and adding an energy penalty for attempting a tackle, this quirk went away.
Setting up Unity on AWS
This was surprisingly painless. The instructions are very well documented here. Unity has an AMI available (just make sure you are looking in the N. Virginia region). If you plan to do this, make sure to include “Linux build” when installing Unity.
In order to run on AWS, I had to automate the training process. I start the players and the goals out at 3x and 1.4x size respectively, and shrink them slowly through the training process. I also gradually decrease the player max speed, and I shift the rewards from completely individual to completely team-based. Currently, I begin with a reward for touching the ball and for passing, and I decrease those overtime. All of these modifications are to counter the otherwise extremely sparse rewards. Now that I’m able to run things quickly on AWS, I’ll next try taking out each of these initial benefits, and I’ll try to pinpoint which (if any) are essential for fast training.
I’m still working on establishing communication between the players. Currently, the goalie outputs 6 floats which are passed as a vector observations to the defenders and attackers. However, at the moment I don’t have a good way to train these floats. (I’m also curious to experiment if the goalie could give very small rewards and penalties to the other players, to encourage them to follow his messages.)
Right now, I’ve moved over to a pytorch jupyter notebook to experiment with a small toy setup. If that’s successful, then I’ll try to bring it back over to Tensorflow. I’ll report more next week on how this goes.