This is Quantic’s last article “Quantum observables for continuous control of the Quantum Approximate Optimization Algorithm via Reinforcement Learning”, by A.Garcia-Saez & J.Riu, available in ArXiv

The article presents a classical optimization strategy for the Quantum Approximation Optimization Algorithm (QAOA) using Reinforcement Learning (RL). The algorithm is tested for several instances of the MAXCUT problem.

In general, RL approaches consist of discrete-time agent-environment interactions. The agent is provided with partial/total observation of the env. and maximizes the reward by acting into it.

The QAOA is implemented such that, at each step of an episode of arbitrary but fixed length p, a pair of parameter-dependent unitary transformations are applied to a Quantum state.

The values of the parameters are selected by the Deep RL agent using as inputs to the Neural Network a set of measurements of the Quantum state: The expected values of X and Z operators for each qubit as well as the clauses of the objective Hamiltonian individually.

At the end of each episode, the agent is rewarded with an amount equal to the expected value of the objective Hamiltonian in the final Quantum state of the environment. Results for an instance of a 3-regular graph with 13 vertices are shown.

Moreover, an incremental training strategy that allows the agent to reach larger p’>p episode lengths is successfully used for graphs with 21 qubits and p up to 25.