DescriptionThe Quantum Approximate Optimization Algorithm (QAOA) is one of the leading candidates for demonstrating quantum advantage. The quality of the solution obtained by QAOA depends on the performance of the classical optimization routine used to optimize the variational parameters. In this work, we propose a Reinforcement Learning (RL) based approach to drastically reduce the number of evaluations needed to find high-quality variational parameters. We train an RL agent on small 8-qubit Max-Cut problem instances on an Intel Xeon Phi supercomputer Bebop, and use (transfer) the learned optimization policy to quickly find high-quality solutions for other larger problem instances coming from different distributions and graph classes. The preliminary results show that our RL based approach is able to improve the quality of the obtained solution by up to 10% within a fixed budget of function evaluations and demonstrate learned optimization policy transferability between different graph classes and sizes.