In this post, I will be exploring the concepts following the paper Deterministic Policy Gradient Algorithms (Silver et al.), implementing the algorithm COPDAC (Compatible Off-Policy Deterministic Actor-Critic) proposed in the paper, and training the agent on the continuous-control environment MountainCar. Introduction In the paper Deterministic Policy Gradient Algorithms, Silver proposes a…