We had a couple of threads on Stroud and the S2 test before the draft. Here's some background from the scientists who developed the test
They Created a Test to Identify Star QBs. How Did It Miss the Best One in Years?
C.J. Stroud has carried the Houston Texans to the NFL playoffs as a rookie despite flunking a cognitive exam before the draft. The scientists behind the test explain why it’s misunderstood
By
Andrew Beaton and
Louise Radnofsky Jan. 18, 2024 5:30 am ET
Quarterback C.J. Stroud had a near sterling résumé ahead of last year’s NFL draft. He possessed a phenomenal arm, a veteran’s presence in the pocket, and a bunch of gaudy statistics from his days at Ohio State.
But shortly before draft day, a bright red flag emerged: Stroud had reportedly bombed a cognitive test that has been lauded for its uncanny ability to identify successful pros. In the freakout that followed, some pundits were convinced it meant that he was a surefire bust.
These days, any concerns about how Stroud performed on a neuroscience exam feel completely absurd. The 22-year-old, who was selected by the Texans with the No. 2 pick last April, just produced one of the best seasons ever for a rookie quarterback. Then he topped it by leading Houston to a 45-14 demolition of the Cleveland Browns in the opening round of the playoffs. If the Texans upset the No. 1 seed Baltimore Ravens this weekend to reach the AFC Championship, Stroud will likely be the main reason why.
The S2 evaluation, which is used everywhere from sports scouting to special forces training, has been touted for its accuracy in identifying which quarterbacks will succeed in the NFL. Patrick Mahomes and Joe Burrow both aced it. When it emerged that Brock Purdy, the out-of-nowhere star selected by the 49ers with the last pick of the 2022 draft, had recorded an elite S2 score, the test went mainstream. Suddenly, a wonky cognitive evaluation was seen as the solution to one of the mystifying problems in sports: how to evaluate NFL quarterback prospects.
Then Stroud’s brilliant play completely upended the narrative.
“I’m not a test taker,” Stroud said last April. “I play football.”
In the wake of Stroud’s play this year, the S2 Cognition test has been ridiculed for whiffing on the best quarterback prospect to enter the NFL in years. That is unsurprisingly aggravating to the performance evaluation company’s two founders, a pair of cognitive neuroscientists named Brandon Ally and Scott Wylie. They say that Stroud’s result should never have leaked and that as soon as his score came in, it was flagged as potentially invalid and an unreliable result.
The debate over the S2’s value also strikes at the core of one of the most lucrative questions in professional sports. In an industry increasingly flooded with data, a test that can precisely calculate the capabilities of athletes is a panacea. But it isn’t yet a reality. And when the S2 exploded in popularity because of its early successes, that also led to a misinterpretation of what S2 says is its actual purpose: to tell coaches and general managers how an athlete is wired.
“There’s nothing on the planet that’s going to be a crystal ball,” says Ally, S2’s co-founder. “We can’t predict success.”
The best way to understand how its success has been distorted is to understand its roots. Ally was a track-and-field athlete at the University of Tennessee before he pursued his scientific interests with a Ph.D., a postdoctoral fellowship and then another five-year fellowship involving research on Alzheimer’s disease. That happened to include testing a patient’s ability to track objects as a way of determining whether it was safe for them to continue driving. But when he watched the NFL draft and heard analysts say things like someone “plays faster than his foot speed,” it made him curious. Could the same evaluations explain the differences between a player’s physical abilities and his actual performance? Wylie, who pursued a similar career path looking at Parkinson’s, had the same thought.
First, Ally and Wylie fine-tuned their methodology on football players at Louisiana State University. They didn’t invent the individual tests that make up the S2—they were using ones that had a track record in clinical research. The difference was they were adapting them to better understand performance on the gridiron. They told the coaches the strengths and weaknesses that the tests indicated for an individual athlete, and asked whether they matched what the coaches saw every day. Then they compared their findings to scouts’ grades.
Attempting to probe the mental acuity of future NFL players was nothing new. For decades, the Wonderlic test was viewed as the gold standard for sizing up the intellectual capacity of players at the game’s most cerebral position: quarterback. Players’ results on the Wonderlic were measured alongside their time running the 40-yard dash at the scouting combine.
“We’re always facing the ghost of the Wonderlic,” Ally says.
Unlike the Wonderlic, which is roughly equivalent to an IQ test, S2’s exam doesn’t measure what Ally calls “cerebral horsepower” or the tools that it might take to be successful in an academic setting. Instead, it runs a series of exams that measure traits such as pattern recognition and impulse control. Then it spits out results, placing players into a percentile for nine different categories of cognitive processes.
The exam is nothing like a traditional multiple-choice standardized test. Test takers are instead shown a series of prompts and have to give their answers quickly. When a shape with a missing corner briefly flashes on the screen, they have to identify which corner was missing. In a way, it replicates the type of split-second decisions athletes have to make on the field—whether that’s to throw the ball to a particular wide receiver or judge if an incoming pitch is a fastball or slider.
Since S2 was founded in 2015, its popularity has boomed in the sports world and especially the NFL, where draft-eligible players are studied in intense detail every year at the annual scouting combine. It gained national acclaim when it was touted for its run of identifying future stars—even improbable ones like Purdy. These days, half the teams in the league work with the company.
But Ally cautions that success in the NFL doesn’t rest entirely on a player’s cognitive abilities. It’s just one factor in a much larger equation that includes everything from physical skills to mental toughness. S2 isn’t responsible for measuring any of those. And for the same reason that wide receivers who run the 40-yard dash in 4.2 seconds can flame out of the league, players who ace the S2 can flop, too.
“That’s the 99th percentile in a physical metric,” Ally says. “If you scored the 99th percentile on a cognitive metric, does that mean you’re going to make it? No, it does not.”
None of those caveats prevented a media inferno when it came out last Spring that Stroud had scored in the 18th percentile, while Alabama’s Bryce Young earned an elite grade. S2’s own research says that players who score below 20 have a worse chance of performing well and earning a multiyear second contract. The consternation over Stroud’s result, though, missed one crucial piece of context: S2 says it registered as potentially invalid.
Out of the nearly 1,000 NFL prospects that take the S2 every year, a dozen or two get flagged by the company for producing an unreliable result. That could be because the player was clearly distracted while taking the exam or tired from the various demands placed on players during the pre-draft process. Stroud’s score was one to receive such a marking, S2 says.
When it came to draft day, the Carolina Panthers ultimately chose to spend the No. 1 pick on Young, who had nailed the S2 test. The Texans were up next. They took Stroud.
Now that order looks completely out of whack. While Young had a poor rookie season, Stroud was the one who played like a natural. He threw for 23 touchdowns, led all qualifying quarterbacks in passing yards per game and posted the league’s lowest interception rate. He’s responsible for giving Houston a chance at advancing to the AFC Championship—and raising questions about a neurocognitive assessment that tabbed him as a potential bust.
“Let’s say we miss 20% of the time. If our standard has to be we can’t miss ever, or we can’t miss on one player, man, that’s tough,” Ally says. “I don’t know anybody in sports who’s that good.”