I am currently working on three main research projects.
One project involves empirical and conceptual work on evaluating machine learning (ML) models. In particular, I am interested in effective strategies for evaluating both the capabilities and alignment of large, general purpose, frontier ML models deployed in open-ended environments. Strategies for model evaluation are subject to several desiderata. For instance, an adequate strategy must make sense of how “benchmarks” bear on capabilities (and alignment) and how particular user interactions with models serve as evidence of capabilities (and alignment). We should also prefer an evaluation framework that makes sense of various model-specific ways of changing how models manifest capabilities (e.g., fine-tuning for LLMs). An ideal strategy for model evaluation will also be scalable (e.g. by being liable to automation), in the sense that it can be deployed to evaluate the capabilities (and alignment) of arbitrarily large (and capable) models.
My second ongoing project builds on work from my first book. I am interested in whether there are strategies for improving people’s epistemic health (e.g., by improving their beliefs or dispositions to trust others) that have four properties: effective, in the sense that the strategies are demonstrably capable of improving matters; practicable, in the sense that the strategies can be executed at scale; morally permissible, in the sense that the strategies respect commonsense constraints on how we can treat persons; and (much less importantly!) philosophically respectable, in the sense that the strategies are compatible with our best theories of epistemology. My view is that there is no strategy that has all four properties, and, worryingly, there may not even be philosophically disreputable strategies that have the first three.
My third project, currently in the planning stages, is a manuscript on the nature, value, and importance of artificial achievement.