Cody Rushing

  • Cody Rushing works at Redwood Research on high-stakes control.

  • I’m most interested in mentoring projects related to AI Control. I am positioned especially well to mentor projects related to high-stakes control, such as:

    • Extending some of the work we did in our Ctrl-Z paper or the original Control Paper. Examples:

      • Measuring how well an attack policy can predict the outputs of a different model. This is pretty useful for understanding how well it might be able to perform attack selection, and informing methodology around AI Control.

      • Providing different resource constraints to the blue team and red team, and understanding how the results change as a result.

    • Building a novel setting for high-stakes control and then running some basic experiments within it

      • One useful thing here will be to understand the sensitivity of the success of control protocols to changes in settings, and whether or not protocols generalize effectively

    • [Maybe] Investigating how likely it is that models of similar sizes/training procedures/architectures produce similar schelling points or goals. Or, similar beyond-episode goals. 

    • [Maybe] Studying whether or not we can tweak models in order to decouple monitors cheaply. The goal of this would be to take a LLM that can currently collude with itself, and cheaply modify it to be unable to.

    I would also be willing to help support projects related ideas in this post.

  • Candidates should be able to execute on empirical research projects quickly. Proficient with Python and a fast software engineer. Being earnest/motivated is important; credentials are less important, but having built cool projects is a plus.

Member of Technical Staff, Redwood Research