This is Quantic’s latest article “Quantum observables for continuous control of the Quantum Approximate Optimization Algorithm via Reinforcement Learning”, by A. Garcia-Saez & J. Riu, available on the ArXiv.

The article presents a classical optimization strategy for the Quantum Approximation Optimization Algorithm (QAOA) using Reinforcement Learning (RL). The algorithm is tested for several instances of the MAXCUT problem.

In general, RL approaches consist of discrete-time agent-environment interactions. The agent is provided with partial/total observation of the environment and maximizes the reward by acting into it.

The QAOA is implemented such that, at each step of an episode of arbitrary but fixed length

*p*, a pair of parameter-dependent unitary transformations are applied to a quantum state.The values of the parameters are selected by the Deep RL agent using as inputs to the Neural Network a set of measurements of the quantum state, which include the expected values of X and Z operators for each qubit as well as the clauses of the objective Hamiltonian individually.

At the end of each episode, the agent is rewarded with an amount equal to the expected value of the objective Hamiltonian in the final quantum state of the environment. Results for an instance of a 3-regular graph with 13 vertices are shown in the following graph:

Moreover, an incremental training strategy that allows the agent to reach larger

*p’>p*episode lengths is successfully used for graphs with 21 qubits and p up to 25: