Cody Rushing
-
Cody Rushing works at Redwood Research on high-stakes control.
-
I’m most interested in mentoring projects related to AI Control. I am positioned especially well to mentor projects related to high-stakes control, such as:
Extending some of the work we did in our Ctrl-Z paper or the original Control Paper. Examples:
Measuring how well an attack policy can predict the outputs of a different model. This is pretty useful for understanding how well it might be able to perform attack selection, and informing methodology around AI Control.
Providing different resource constraints to the blue team and red team, and understanding how the results change as a result.
Building a novel setting for high-stakes control and then running some basic experiments within it
One useful thing here will be to understand the sensitivity of the success of control protocols to changes in settings, and whether or not protocols generalize effectively
[Maybe] Investigating how likely it is that models of similar sizes/training procedures/architectures produce similar schelling points or goals. Or, similar beyond-episode goals.
[Maybe] Studying whether or not we can tweak models in order to decouple monitors cheaply. The goal of this would be to take a LLM that can currently collude with itself, and cheaply modify it to be unable to.
I would also be willing to help support projects related ideas in this post.
-
Candidates should be able to execute on empirical research projects quickly. Proficient with Python and a fast software engineer. Being earnest/motivated is important; credentials are less important, but having built cool projects is a plus.
Member of Technical Staff, Redwood Research