Original post here: http://www.pinchofintelligence.com/human-input-when-reinforcement-learning/
During the “Twitch plays pokemon” https://en.wikipedia.org/wiki/Twitch_Plays_Pok%C3%A9mon hype there were between 60.000 and 70.000 viewers watching and steering an agent playing the game Pokémon. If we could use all these human resources to train a machine learning algorithm we could achieve some great results.
The game I wanted to learn playing was Mario, because of the available open source platform, and the wide amount of existing solutions. In the end an existing reinforcement-learning implementation was used. The traditional problem with reinforcement-learning is that the agent chooses many actions that we as humans know will lead to the death of Mario. This was solved by letting the algorithm learn from the actions that a normal playing user would take. While the user played our game on a beamer the algorithm was constantly evaluating if their action was beneficial, and learned from it.
On the BNAIC 2014 conference this game was playable by everybody who downloaded an Android or Iphone app. During the conference the agent learned faster than without human input. My accepted paper can be found here: http://bnaic2014.org/?page_id=154 .