Integrative experiments

The dominant way we run experiments in social and behavioral science — one experiment at a time, each treated as a test of a theory assumed to generalize — has a serious problem. The integration across experiments that is supposed to happen in the published record largely doesn’t, and the recent push for more reliable single findings doesn’t fix it. You can do every individual experiment perfectly well and still not end up with a cumulative theory.

With Abdullah Almaatouq, Tom Griffiths, Jordan Suchow, James Evans, and Duncan Watts, we argue that the fix has to happen at the level of experimental design. In integrative experiments, researchers explicitly map the space of possible experiments associated with a research question, then iteratively sample from that space. Instead of trying to defend a single experimental condition as the one that captures the phenomenon, you treat the design space itself as the object of study BBS.

The paper drew a large set of commentaries from across the field, which we responded to in a follow-up BBS. The discussion — about what theories are for, how generalization should work, and whether the field needs a different unit of analysis — is, for me, the most interesting part of the project.

Most of the other threads in my recent work connect here. Empirica is the infrastructure that makes integrative experiments actually runnable; the common sense framework and the task space are both attempts to make the design space in a particular domain tractable; and the forecasting results are one way of showing that the status quo is leaving a lot on the table.