Cody Rushing

Cody Rushing works at Redwood Research on high-stakes control.
I’m most interested in mentoring projects related to AI Control. I am positioned especially well to mentor projects related to high-stakes control, such as:
- Extending some of the work we did in our Ctrl-Z paper or the original Control Paper. Examples:
  - Measuring how well an attack policy can predict the outputs of a different model. This is pretty useful for understanding how well it might be able to perform attack selection, and informing methodology around AI Control.
  - Providing different resource constraints to the blue team and red team, and understanding how the results change as a result.
- Building a novel setting for high-stakes control and then running some basic experiments within it
  - One useful thing here will be to understand the sensitivity of the success of control protocols to changes in settings, and whether or not protocols generalize effectively
- [Maybe] Investigating how likely it is that models of similar sizes/training procedures/architectures produce similar schelling points or goals. Or, similar beyond-episode goals.
- [Maybe] Studying whether or not we can tweak models in order to decouple monitors cheaply. The goal of this would be to take a LLM that can currently collude with itself, and cheaply modify it to be unable to.
I would also be willing to help support projects related ideas in this post.
Candidates should be able to execute on empirical research projects quickly. Proficient with Python and a fast software engineer. Being earnest/motivated is important; credentials are less important, but having built cool projects is a plus.

Member of Technical Staff, Redwood Research

Cody Rushing

Biography

Mentor topics

Desired fellow qualifications