Adding Simulation Mechanics

 In Highlights, Reinforcement Learning, Soccer

This week I continued to work on my soccer team simulation. I focused on improving the mechanics of the game – adding an energy level tracker for each player, and adding in the ability to steal the ball from the opposing team. I also set up training on AWS.

Adding Energy

A bar over each player’s head shows its current energy. As the players tire, they run more slowly and lose the ball more often.

Previously, I encouraged the players to conserve energy by adding a slight negative reward for fast velocities. I realized this simplistic version wasn’t an accurate enough model. I now track the energy level for each agent. When the player runs faster or dribbles the ball, the energy levels decrease. When the player rests, the energy gradually restores. Tired players have a decreased maximum speed, and are more likely to lose the ball to an opponent.

This new system allows players to make a burst of exertion, without penalty or discouragement. However, it forces them to ration this activity, and will hopefully push the players to pass more often, removing the need to add a specific reward for passing.

Additionally, I added in the ability for a player to “tackle” a nearby player. Previously, a player could only take the ball if they collided directly with the ball. Now, even if the player only collides with the opposing player, they can steal the ball with some probability. (Currently, the goalie is successful in stealing the ball 95% of the time, and other players 60% of the time.) I give the players a small reward for tackling an opponent.

Initially, I set the tackling reward too high, and the players would form a red-blue pair and run down the field tackling each other over and over. After reducing the reward and adding an energy penalty for attempting a tackle, this quirk went away.

Setting up Unity on AWS

This was surprisingly painless. The instructions are very well documented here. Unity has an AMI available (just make sure you are looking in the N. Virginia region). If you plan to do this, make sure to include “Linux build” when installing Unity.

In order to run on AWS, I had to automate the training process. I start the players and the goals out at 3x and 1.4x size respectively, and shrink them slowly through the training process. I also gradually decrease the player max speed, and I shift the rewards from completely individual to completely team-based. Currently, I begin with a reward for touching the ball and for passing, and I decrease those overtime. All of these modifications are to counter the otherwise extremely sparse rewards. Now that I’m able to run things quickly on AWS, I’ll next try taking out each of these initial benefits, and I’ll try to pinpoint which (if any) are essential for fast training.

Communication

I’m still working on establishing communication between the players. Currently, the goalie outputs 6 floats which are passed as a vector observations to the defenders and attackers. However, at the moment I don’t have a good way to train these floats. (I’m also curious to experiment if the goalie could give very small rewards and penalties to the other players, to encourage them to follow his messages.)

Right now, I’ve moved over to a pytorch jupyter notebook to experiment with a small toy setup. If that’s successful, then I’ll try to bring it back over to Tensorflow. I’ll report more next week on how this goes.

Recommended Posts
Showing 5 comments
  • Jasper

    That looks impressive! How did you implement the ball stealing?? That’s one thing I’ve trying to do for days now.

    • mcleavey

      Thanks! I added in a concept of “possession” – I track when the player’s collider meets the ball’s collider (early on in training, I reward the players for possession, and I also used this for determining who kicked the ball out of bounds, so that the opposing team gets the ball for the throw in). When a player has possession, I allow it keep the ball directly in front of it, even when it spins around (just by moving the ball’s transform directly — I didn’t want to worry about modeling the specifics of soccer footwork 🙂 ) If another player comes close enough, I transfer possession to that player with a fixed probability. When I added in energy, I allowed that probability to change (so a tired player would be more likely to lose the ball to a player with energy). I don’t do anything fancy to switch possession, but now the new owner has the option to spin away and the ball will stay in front of it.

  • Jasper

    Ah okay so at the end you did a direct hand over of the ball. I did the same but it resulted in a “possession” hanging of a group of players in the corner. — It’s an interesting thing with the energy stats, but way had you not just limited the steps of the different player types?

    In my lerning simulations all of these roules like the “corner penalty” resolted in a sorte of desaster after 2 million steps.
    At the end I just keept the optical changes, in order to get faster and better training results (no good idea of the Unity guy’s to make the ball in the same color pattern as the goal area or that the players can see them themselves in the reflection of the “transparent” walls) and startet a weighting of the diffrent player types, added a light ball, added reflection of corners and so on.
    The diffrent player types defender, goal keeper, striker not only have diffrent drag, mass and max. step count but also diffrent forms. I hope this way the “communication” will be established.

    Cube: Goal Keeper
    Cylinder: Defender
    “Pill” form: Striker

    1xTeam: 1xCube, 1xCylinder, 2xPill’s

    The arena is simliar to the one you have in your first article. -> bigger no walls.

    https://www.dailymotion.com/video/x6rfhw5

  • Werner

    Interesting work.

    I may have missed it … I wonder, is there a summary description of the overall soccer project?

    We recently did a project on machine learning in soccer based on simple (but widely available) data.
    https://link.springer.com/journal/10994/topicalCollection/AC_5423abcd021c04f2678eea3a27580457

    • mcleavey

      Thanks! The way this summer scholars program is structured, we spend the first few weeks exploring, and then the second part of the summer working on a main project. I decided to focus on the music generation/language modeling work, so never had a chance to write up this RL exploration project properly. Now that the scholars program is wrapping up (this is already our last week!), I hope to circle back and write up the soccer project as well. Thanks for the link to your work – that looks great! I’ll take a closer look next week.

Leave a Comment

Contact Us

We're not around right now. But you can send us an email and we'll get back to you, asap.

Not readable? Change text. captcha txt

Start typing and press Enter to search