Oblique Ways of Measuring Intelligence

If you want to measure someone’s intelligence, the standard approach is to give them an intelligence test, or some reasonably accurate proxy: a standardized test, an admissions test, a rationality test, etc. But these are obviously measures of intelligence; the person taking them knows what you’re doing. What if you want to estimate someone’s intelligence without making it obvious that this is what you’re measuring?

Eye tracking during Raven’s Matrices

One approach is to give them a Raven’s Matrices test, which, to be fair, is still an intelligence test. But instead of scoring the test-taker based on how many items they answer correctly, you can look at their eyes while they take the test. In Using a multi-strategy eye-tracking psychometric model to measure intelligence and identify cognitive strategy in Raven’s advanced progressive matrices, the researchers use eye-tracking data to infer which strategy participants are using on each item.

There are two common strategies people use when solving matrix questions: a constructive matching strategy, in which participants look at the matrix, mentally construct the answer, and then choose the matching option, and a response elimination strategy, in which participants look back and forth between the matrix and the possible answers, iteratively eliminating options.

For example, in the figure below, on the first item, Participant 1 mostly looked at the matrix interest area: they were constructing the correct answer in their head, then quickly glanced around the response-options interest area, found two plausible answers, and picked one. On the second item, the same participant kept looking back and forth between the matrix and the response options, apparently fitting the symbols into the diagram and eliminating them one by one until they made a choice.

Eye-tracking scan paths for a participant solving two Raven's Matrices items, contrasting a matrix-focused strategy with frequent back-and-forth looks between the matrix and answer choices. — Eye-tracking scanpaths from Raven’s Advanced Progressive Matrices.

It turns out that people adjust their strategies based on both item difficulty and their own ability. As items become harder, more intelligent participants become more likely to use constructive matching, while less intelligent participants become more likely to use response elimination. Using only the information about which strategies participants use for each item, researchers were able to calculate intelligence scores that correlated an astonishing 0.986 with the intelligence scores calculated from which items participants answered correctly.

Predicting intelligence from the MMPI

What if we don’t want them to take a test that looks so much like an intelligence test? We could instead have them take the Minnesota Multiphasic Personality Inventory (MMPI), a psychometric test designed to measure personality and psychopathology. The MMPI consists of 567 short statements, each of which the test-taker answers True or False. In Can Intelligence Be Predicted from the MMPI? An Out-of-Sample Validation Study, Kirkegaard uses data from the Vietnam Experience Study, in which participants took both the MMPI and an intelligence test, to estimate how well intelligence can be predicted from MMPI item responses.

He finds that out-of-sample predictions of intelligence from MMPI responses, using an elastic net model, correlate with intelligence test scores at r = 0.84. This is very high, about as good as an actual intelligence test.

Scatterplot of out-of-sample intelligence predictions from MMPI items against measured intelligence, with a smooth fit and strong positive association.

And you don’t even need that many items: the correlation reaches 0.80 with roughly 100 items and 0.70 with roughly 25 items.

Line graph showing cross-validated prediction accuracy as the number of MMPI items in the model increases.

Response consistency and response styles

There’s a third method: instead of looking directly at people’s responses, we can look at how their responses differ from a model-predicted “ideal.” This method, developed in Can You Tell People’s Cognitive Ability Level from Their Response Patterns in Questionnaires?, makes use of two ideas. The first is the worst performance rule, which says that people’s worst performance across multiple sequential tasks is more indicative of their cognitive ability than their average or best performance. The second is the task complexity hypothesis, which says that relationships between cognitive ability and performance become stronger as task complexity increases.

In the paper, answering questionnaire items is treated as a series of cognitively demanding tasks. The researchers use an IRT model to estimate each participant’s level of the relevant latent traits, then calculate the difference between each participant’s observed response and their model-predicted response for each item. These differences are called “response error” scores. The latent variable estimated from response errors on the most complex items correlated 0.50 with a latent cognitive ability factor. In other words, people who give more internally consistent answers tend to be smarter as well.

There are also many other kinds of response styles with distinctive relationships to intelligence. For example, acquiescence bias, the tendency to agree with items regardless of their content, and extreme responding, the tendency to choose the endpoints of a response scale, both show negative correlations with intelligence. In principle, these response styles could also be used as indirect signals of intelligence.

Takeaway

None of these methods are magic. They work because intelligence is not an isolated trait that only appears on intelligence tests. It leaks into other behaviors: how people search a problem space, how they answer personality items, how internally consistent their responses are, and even which kinds of response biases they show.

This is both interesting and slightly uncomfortable. Eye movements, questionnaire responses, and response styles are not usually treated as intelligence measures, but in aggregate, they can become exactly that. And it raises an obvious question: how far does this generalize? If Raven’s scanpaths, MMPI answers, and questionnaire response errors contain this much signal, what about writing style, browsing behavior, social media activity, speech patterns, reaction times, mouse movements, or the thousands of other small behavioral traces people leave behind?

Presumably, most of these signals are weak. But our world is inferentially entangled: traits leave traces, traces correlate with other traces, and enough weak signals can add up to a surprisingly strong proxy. If you’re smart enough, or if you simply measure precisely enough, you can gain a great deal of hidden knowledge.