The approach proposed by Leike et al., identifies unaligned AGI as a major risk to humanity. However, a narrow focus on model alignment may lead researchers to overlook larger societal-scale risks. Leike et al. outline alignment risks under the limitations of their approach. The most concerning risk being the amplification of human biases through reinforcement learning from human feedback (RLHF). Despite acknowledging this limitation, the authors mention that RL is a “core building block for the scalable alignment proposals”. However, RLHF has been found to risk optimising for sycophantic responses instead of accuracy, as AI systems are scaled (Sharma et al 2023). For example, Perez et al (2022) have found evidence for political sycophony and gender bias, where models are more likely to repeat the user’s political views as the model increases with size. The authors propose 3 methods to assist with scaling alignment: recursive reward modelling (RRM), debate and iterative amplification. Yet, all methods rely on reinforced learning through human evaluation. This aspect to their approach makes it challenging to avoid problems that have emerged with RLHF. The authors acknowledge that currently there is no scalable solution to the alignment problem, they identify that the need for humans to write goal functions could be eliminated as a possible solution. RLHF is sensitive to the beliefs and preferences of the human evaluators and could amplify pre-existing biases. The authors undervalue the importance of avoiding sycophantic responses in their approach to alignment research.