Portfolio Project

Description

This project is training an agent for playing Atari Kung-Fu Master gym environment with parallel Advantage Actor-Critic method. First, four grayscaled frames are stacked and cropped to make the agent interpret the object's velocity. Then, the agent is built with three convolutional neural layers containing 32 filters and the elu activation function, followed by a final dense layer with 128 units. Finally, there are two outputs, one for the actor and one for the critic part, the actor has its units based on the number of actions in the game, and the critic is the value predicted which has one unit. Then, the environment is trained to get the rewards. The training process is done with ten different environments on parallel to make the training more stable. For each of the environments, the agent tries to use this a3c algorithm to improve its rewards. After reaching a good reward, and a near-zero policy entropy, it is interpreted that the agent has learned the policy.