Quantifying common sense

Common sense is one of those ideas that everyone appeals to and no one agrees on. It’s supposedly universal, but it’s also regularly invoked to complain that other people don’t have any. That tension — universal in principle, contested in practice — is actually measurable, if you’re careful about what you count.

With Duncan Watts and collaborators, I developed a method to quantify common sense empirically, at two levels: for an individual (how aligned is this person’s take on a claim with everyone else’s?) and for a collective (how much does a group actually agree?). Running the method over a large set of human-rated claims, we found that what we think of as “commonsense” varies a lot depending on the kind of claim. The clearest agreement shows up on plainly-worded factual claims about the physical world; agreement drops off sharply for claims that are social, normative, or ambiguously worded. Interestingly, who the raters are matters much less than what kind of claim it is. And at the collective level, the universal common sense people often assume exists mostly doesn’t PNAS.

The paper’s full significance statement spells this out:

Common sense, while often portrayed as universal, is paradoxically also often claimed not to exist. Here, we resolve this puzzling situation by introducing a formal methodology to empirically quantify common sense both at individual and collective levels. We then demonstrate the method with a dataset involving human raters evaluating claims. We show that common sense varies considerably across types of claims but aligns most closely with plainly worded, factual claims about physical reality; in contrast, does not vary much across different types of people. We also find limited presence of collective common sense, undermining universalist claims and supporting skeptics. Finally, we argue that quantifying common sense is useful both for applications in social science and AI.

The method turns out to be useful beyond humans — we’ve since used it to evaluate common sense in large language models, where having a single framework that applies to both humans and machines is genuinely handy.

The work has received attention at a few sources, which Altmetric summarize nicely: