The Watson Glaser Critical Thinking Appraisal

17 min

4.7

A Comprehensive Guide

Introduction

Nova: Imagine taking a test where the more you know about logic and reasoning, the worse you might actually score. Sounds absurd, right? Yet that's exactly the paradox at the heart of one of the most influential psychological assessments of the last century — the Watson-Glaser Critical Thinking Appraisal.

Nova: : Wait, hold on. That can't be right. A critical thinking test that penalizes you for being good at critical thinking?

Nova: It sounds counterintuitive, but there's a fascinating academic paper with the provocative title "The More You Know, the Lower Your Score." We're going to unpack that. But first, let's set the stage. The Watson-Glaser Critical Thinking Appraisal — often just called the Watson-Glaser test — has been around for nearly a century. It's the gold standard for measuring critical thinking in recruitment, especially in law. If you've applied to a major law firm like Clifford Chance, Linklaters, or Hogan Lovells, you've probably faced it.

Nova: : And I'm guessing this isn't actually a book by someone named H. Watson?

Nova: Sharp catch. A lot of people make that assumption — that it's a book by H. Watson. The truth is more interesting. The test was created by two men: Goodwin Barbour Watson, a professor of social psychology at Columbia University's Teachers College, and Edward Maynard Glaser, a consulting psychologist. No H. Watson at all. Watson started developing the ideas behind the test way back in 1925, and Glaser's 1937 doctoral dissertation — titled "An Experiment in the Development of Critical Thinking" — became the foundation for what they published together in 1949.

Nova: : So almost a hundred years of history. And originally this was for soldiers?

Nova: Exactly. It was developed to assess critical thinking in military recruits. But over the decades it migrated into business, government, and especially law. Today it's published by Pearson under their TalentLens division, and it's used across the globe. Forty questions, thirty minutes, and it can make or break your career at a top firm. Ready to dig in?

Nova: : Let's do it.

The Origins of the Watson-Glaser Test

Two Men, One Mission

Nova: Let's start with the two men behind the test, because their story tells us a lot about why this assessment exists. Goodwin Watson was born in 1899 in Whitewater, Wisconsin. He became a professor at Teachers College, Columbia University, where he taught from 1925 to 1962. But he wasn't just an academic — he was a social activist and reformer. He founded the Union Graduate School and was deeply involved in progressive education movements.

Nova: : So he was thinking about how people think, but also about how society works?

Nova: Precisely. And then there's Edward Maynard Glaser, born in 1911. While doing his PhD at Columbia, he conducted a remarkable experiment. He actually tried to teach critical thinking and then measure whether his teaching worked. That dissertation became the seed of everything. He later became a consulting psychologist at the firm Rohrer, Hibler, and Replogle in Los Angeles, and he went on to write extensively about organizational effectiveness and leadership.

Nova: : I'm curious about that experiment. What did Glaser actually do?

Nova: He designed a teaching program in critical thinking and delivered it to students, then measured their abilities before and after using specially designed questions. Those questions tested five specific skills — and those same five skills became the sections of the test we know today. Inference, Recognition of Assumptions, Deduction, Interpretation, and Evaluation of Arguments. In 1949, Watson and Glaser published the first version of the appraisal through the World Book Company. The manual followed in 1952.

Nova: : So the test was born from an actual educational experiment, not just someone sitting in a room designing tricky questions.

Nova: That's what makes it so durable. It's rooted in real pedagogy. But here's a fun fact — the original test booklet from 1949 is now in the collection of the Smithsonian's National Museum of American History. An eight-page psychological test preserved as a historical artifact. That tells you something about its cultural significance.

Nova: : It really was a landmark. What happened after 1949?

Nova: The test went through multiple revisions. It moved from the World Book Company to Harcourt, Brace and World, then to the Psychological Corporation, and now it's under Pearson. We're currently on the third major edition — Watson-Glaser III. Each revision has refined the questions and updated norms, but the five-section structure has remained remarkably consistent for over seventy years.

Nova: : And the core idea — measuring whether someone can think, not just what they know.

Nova: Exactly. And that brings us to how the test actually works.

The Structure of the Test

Inside the Five Chambers

Nova: So picture this: you sit down at a computer. You have forty questions and thirty minutes. That's about forty-five seconds per question. And the questions are divided into five distinct sections, each testing a different muscle of critical thinking.

Nova: : Let me guess — the five you just mentioned. Walk me through them.

Nova: Section one: Inference. You get a paragraph of facts, and you must assume every word of it is true. Then you're given several possible conclusions and you have to rate each one as True, Probably True, Insufficient Data, Probably False, or False. Five options. This is the only section with a five-point scale.

Nova: : So you're not saying "I think this is true based on my life experience." You're saying "based strictly on what's written here, how likely is this conclusion?"

Nova: That distinction is everything, and it's where many test-takers stumble. They bring in outside knowledge and get penalized. Section two: Recognition of Assumptions. You read a statement, then you're asked whether a given assumption is actually made within it. Twelve questions, binary choice — Assumption Made or Assumption Not Made.

Nova: : This sounds like it's testing whether you can spot the invisible scaffolding behind an argument.

Nova: Beautifully put. The scaffolding that the arguer never says out loud but depends on completely. Section three: Deduction. Five questions where you're given premises — statements you must treat as absolute truth — and a conclusion. Your job: does the conclusion necessarily follow? Yes or no. This is formal logic, stripped of real-world judgment.

Nova: : So even if the conclusion is absurd in real life, if it follows logically from the premises, you say it follows.

Nova: Exactly. Section four: Interpretation. Six questions. You read a short passage, then a conclusion, and you decide whether that conclusion follows beyond a reasonable doubt. It's less rigid than deduction — more about practical reasoning. And section five: Evaluation of Arguments. Twelve questions. You're given a proposition — like "Should all employees be allowed to work from home?" — and then several arguments for or against. You rate each as Strong or Weak.

Nova: : And what makes an argument strong versus weak?

Nova: It has to be both important and directly relevant to the question. An argument can be factually true but still weak if it misses the point. And you must ignore your personal opinion entirely. You might be a passionate advocate for remote work, but if an argument against it is logically sound and relevant, you mark it Strong.

Nova: : So the test is basically saying: can you separate your identity and beliefs from the cold machinery of logic?

Nova: That might be the best summary I've heard. And that brings us to a framework that organizes all five sections.

Pearson's Model of Critical Thinking

RED: The Engine Under the Hood

Nova: Pearson, the current publisher, organizes the Watson-Glaser around something called the RED model. R-E-D. Recognize assumptions, Evaluate arguments, Draw conclusions.

Nova: : So it collapses five sections into three core skills?

Nova: It's more of a conceptual umbrella. Recognize Assumptions maps directly to section two. Evaluate Arguments maps to section five. And Draw Conclusions encompasses Inference, Deduction, and Interpretation — the three sections where you're reaching a judgment based on given information.

Nova: : It's elegant. But why does this model matter beyond the test?

Nova: Because this isn't just test prep jargon. Pearson's research shows that mentions of critical thinking in job postings more than doubled between 2009 and 2014. Companies are desperate for people who can do exactly what the RED model describes: see past surface claims, weigh evidence, and reach sound judgments. The Watson-Glaser just happens to be the most established way to measure it.

Nova: : And law firms are the biggest users?

Nova: Law firms, absolutely — Clifford Chance, Linklaters, Hogan Lovells, the UK Government Legal Service. But also consulting firms, financial services, healthcare organizations. Anywhere that analytical reasoning directly impacts outcomes. The test has predictive validity — research shows a correlation of about 0.62 between Watson-Glaser scores and training success in legal education.

Nova: : What does 0.62 mean in practical terms?

Nova: It's a moderate-to-strong correlation. It means that people who score higher on Watson-Glaser tend to perform better in legal training. Not perfectly — no test is — but significantly enough that firms keep using it. Pearson's 2020 efficacy report put the internal consistency reliability of Watson-Glaser III at 0.83, which is considered good.

Nova: : But here's what I want to know: what score do you actually need?

Nova: There's no official pass mark. Instead, firms compare candidates against percentile rankings. For top London law firms, you typically need to hit the 75th to 80th percentile. That translates to roughly 33 to 34 correct answers out of 40. The average score hovers around 55 percent — about 22 out of 40. So the bar for elite firms is high.

Nova: : And if you score in the 90th percentile?

Nova: You're virtually guaranteed to pass the screening stage at any firm. But here's where things get really interesting — and controversial.

Criticisms and Controversies

The More You Know, The Lower You Score

Nova: In 2014, a philosophy professor named Kevin Possin published a devastating critique of the Watson-Glaser in the journal Informal Logic. The title says it all: "Critique of the Watson-Glaser Critical Thinking Appraisal Test: The More You Know, the Lower Your Score."

Nova: : That's the paradox you mentioned at the beginning. What's his argument?

Nova: Possin argues that the test has serious construct-validity issues. That is, it may not actually measure what it claims to measure. He points to ambiguous, unclear, and sometimes misleading instructions that have remained largely unaltered for decades. Worse, he claims some items are erroneously scored — meaning the official correct answer is actually wrong from a rigorous logical perspective.

Nova: : So someone with advanced training in formal logic might see nuances that the test doesn't account for, and they'd choose what they know to be the logically correct answer — which the test marks as wrong.

Nova: Exactly. Possin's core claim is that enhanced knowledge of formal and informal logic could result in test-takers receiving lower scores. A philosophy PhD could perform worse than a smart undergraduate who hasn't been trained to overthink. That's a fundamental problem for any assessment that calls itself a critical thinking test.

Nova: : Has Pearson responded to this?

Nova: The test has been revised — we're on version three — and Pearson does publish efficacy data. But Possin's critique specifically targets the persistent ambiguities that survive across revisions. Other researchers have also found inconsistent evidence of validity and reliability. A 2013 study by Verburgh and colleagues reported low reliability coefficients in certain contexts. And a 2015 study raised questions about how well Watson-Glaser scores predict actual degree performance across disciplines.

Nova: : So on one hand, law firms swear by it. On the other, philosophers say it's logically flawed.

Nova: That tension is part of what makes the Watson-Glaser so fascinating. It sits at the intersection of psychometrics, philosophy, and high-stakes employment. And it's been there for nearly a hundred years. The fact that it's survived this long — despite real criticisms — tells you something about the hunger for measurable critical thinking. We want to quantify the unquantifiable.

Nova: : Or maybe we want a shortcut. Instead of evaluating someone's thinking over time, we give them a thirty-minute test.

Nova: That's the practical reality of recruitment. When a law firm gets thousands of applications for a handful of training contracts, they need a filter. The Watson-Glaser, for all its imperfections, is perceived as more objective than CV screening and more job-relevant than general intelligence tests. The question isn't whether it's perfect — it's whether it's better than the alternatives.

Nova: : And the alternatives are?

Nova: There are other critical thinking assessments — the Cornell Critical Thinking Test, the California Critical Thinking Skills Test, the Halpern Critical Thinking Assessment. But none have the market penetration of Watson-Glaser, especially in law. It has first-mover advantage combined with decades of normative data. Once a test becomes the industry standard, it's very hard to dislodge.

Strategies, Preparation, and the Coaching Industry

Beating the Test

Nova: Here's a curious thing: an entire industry has grown up around preparing people for the Watson-Glaser. JobTestPrep, AssessmentDay, The Corporate Law Academy, PrepTerminal — they all offer practice tests, video tutorials, strategy guides.

Nova: : Which raises an interesting question. If you can be coached to a higher score, is the test measuring critical thinking, or is it measuring how well you've been prepped for the test?

Nova: That's the classic test-coaching debate. Pearson would argue that practice improves familiarity with the format, not the underlying skill. But the prep companies teach specific algorithms for each section. For instance, for Recognition of Assumptions, they teach something called the Negative Test. You take the proposed assumption, negate it, and see if the original statement still makes sense. If negating the assumption makes the statement meaningless, then the assumption is made.

Nova: : That's a technique. Once you know it, you're not necessarily a better critical thinker — you're just better at that specific question type.

Nova: Exactly. For Deduction, there's the NOT Triangle — Negative, Transpose, Only — a method for rephrasing premises without changing their logical meaning. For Evaluation of Arguments, there's the ITDN table — Important, True, Direct, New — a checklist for assessing argument strength.

Nova: : So students are essentially learning the test's hidden rulebook, not becoming deeper thinkers.

Nova: Some would say both things happen. Learning to avoid common reasoning fallacies and to check assumptions carefully is genuinely useful. But yes, a significant portion of score improvement comes from cracking the test's specific logic rather than developing transferable critical thinking skills. And the stakes are high — aspiring lawyers spend weeks or months preparing, often paying for premium prep packages with hundreds of practice questions.

Nova: : What about time management? Forty-five seconds per question is brutal.

Nova: Time pressure is arguably the hardest part. Successful candidates learn to triage. The Inference section has only five questions but with five answer options each — it's complex and time-consuming. The Recognition of Assumptions has twelve questions but only two answer options — Assumption Made or Not Made. Smart test-takers calibrate their pace accordingly. And most versions of the test allow you to skip and return to questions, so you don't get stuck on one and run out of time.

Nova: : Any final words of wisdom from the prep world?

Nova: The single most repeated piece of advice: base your answers strictly on the information given. Ignore everything you know about the real world. If the passage says the sky is green, then for the purposes of that question, the sky is green. Your job is not to be right in some absolute sense — it's to follow the logic of the text. That mental discipline, that ability to suspend your own knowledge, is actually a genuine critical thinking skill. It's just one that most people never practice.

Conclusion

Nova: So where does this leave us? The Watson-Glaser Critical Thinking Appraisal isn't a book by H. Watson — it's a psychological instrument created by Goodwin Watson and Edward Glaser, born from a 1930s experiment in teaching critical thinking, and now used globally to assess the reasoning skills of aspiring lawyers, consultants, and professionals.

Nova: : It's got five sections — Inference, Recognition of Assumptions, Deduction, Interpretation, and Evaluation of Arguments — all organized under the RED model: Recognize assumptions, Evaluate arguments, Draw conclusions. Forty questions, thirty minutes, and a percentile ranking that can open or close doors.

Nova: It's been criticized for ambiguous instructions and questionable construct validity. Yet it remains the industry standard, partly because it's been around so long, partly because it genuinely correlates with job performance, and partly because we simply don't have a better way to measure critical thinking at scale.

Nova: : And here's the uncomfortable truth: a test that's been around since 1949, that was originally designed for screening soldiers, is now determining who gets to be a lawyer at the world's most prestigious firms. That's a lot of weight for forty multiple-choice questions.

Nova: But maybe the real lesson of the Watson-Glaser isn't about the test at all. It's about the enduring, almost obsessive human desire to measure the mind — to make thinking visible and rankable. Goodwin Watson and Edward Glaser tapped into something fundamental. And nearly a century later, we're still wrestling with the same question they were: can you really test critical thinking?

Nova: : Or are we just testing how well someone can follow a very specific, somewhat flawed set of rules?

Nova: That's the tension. And it's not going away anytime soon. So if you're preparing for the Watson-Glaser, learn the rules, practice the algorithms, and remember — leave your real-world knowledge at the door.

Nova: : This is Aibrary. Congratulations on your growth!

00:00/00:00