Created a deep neural network that plays 3D Tic Tac Toe for the final project of the Dynamic Programming and Reinforcement Learning course at LUMS.

Two instances of the neural network assume the role of X and O player. Both players start with high probability of exploration and very low probability of exploitation. The player action is selected at random during exploration, while it is selected based on the Q values predicted by the neural network during exploitation. To improve the results of the neural networks, any action that results in an immediate win, or prevents an immediate loss was given priority over the randomly selected or predicted action. The probability of exploration keeps decaying over time and the models are retrained after each game.

At some point, the model for player O learned to fork the opponent, increasing its chances of winning the games. After training for ten thousand games, the neural networks were used in a competition, where they competed against the models of my peers. In the competition, the neural networks managed to secure three wins and one draw, while only losing a single match.

The code is available on my GitHub.