Problems with concussion testing in sports

Most currently available concussion tests rely on baseline scores. At the beginning of the season, athletes complete an assessment to establish their baseline cognitive health. When an athlete sustains a head injury or shows symptoms of a concussion, a trainer or coach administers the test again and compares the two sets of scores. If the post-injury scores are substantially lower than the baseline results, athletes are removed from play.

At first glance, this approach seems reasonable. Everyone's brain is different, so by comparing a potentially concussed individual to that individual's "healthy" state, you should be able to determine if there is a problem. But this method has some significant shortcomings. Let’s first take a look at two of the most common baseline-dependent concussion tests used today: ImPACT and BESS.

ImPACT
The Immediate Post-concussion Assessment and Cognitive Testing (ImPACT) is an FDA-listed protocol used by over 75% of NCAA programs. ImPACT uses both performance (such as memory tests and visual reaction time) and self-reported symptom severity (graded 0-6 for each of 22 possible symptoms) to assess and monitor concussions. If a baseline test is not available, an athlete's ImPACT score may be compared to population averages to determine the likelihood of a concussion.

BESS
The Balance Error Scoring System (BESS) is a method for detecting postural instability that can sometimes occur from a concussion. Many trainers prefer BESS because it is fast, easy, inexpensive to administer. A trained administrator observes the athlete standing in 3 different stances first on a firm surface, then in the same 3 stances on a foam surface. The administrator rates the athlete and determines when errors are made, with a maximum potential of 10 errors per stance. The number of errors is totaled across stances, for a maximum possible score of 60. Following a head injury, the athlete is re-tested. If the score has significantly increased from baseline, a concussion is inferred. However, many athletes who have sustained concussions do not have an increased BESS scores.

What’s Wrong with Baseline Testing?
At many schools, athletes only complete baseline testing (for whatever test(s) that particular school uses) at the beginning of their freshman year. That same baseline score is then used through all 4 years of college.

College athletes train daily to improve their speed, balance, physical strength and agility. One would expect (or at least hope!) that an athlete's core strength and balancing skills would improve over 4 years of training at a high level. Yet if a head injury occurs during the athlete’s senior year, their post-injury BESS score will be compared back to their wobbly freshman-year baseline.

It's also possible for athletes to game the system. An athlete may intentionally stumble or make mistakes during baseline testing, allowing them greater wiggle room, as it were, during a post-injury test. As a former student-athlete I know that a competitive atmosphere can make people very short-sighted. Participating in the game/match/meet/practice today always seems much more important than preventing long term damage. It's tempting to intentionally make mistakes (or at least not give your best effort) during baseline testing, with the hope of return to play more quickly after an injury.

At the middle- and high-school level, the efficacy of baseline testing is severely diminished due to natural growth and development. According to a recent study (Abeare et al. 2018), ImPACT baseline scores correlate strongly with age (see figure below). Using data from 7,897 baseline tests from individuals between the ages of 10 and 21, researchers found that baseline scores improved with age, and also that the scores for younger individuals were highly innaccurate. Depending on the Embedded Validity Indicator (EVI) (a system used to determine the normative range), up to 84% of healthy 10 year olds were "concussed".

One could argue that baseline testing eliminate this problem — a child could be compared back to his or her own scores after an injury. But note on the graph below how quickly the Base Rate of Failure (BRF) decreases with age. Baseline testing would have to occur extremely frequently in order to accurately identify a concussed individual. Even college-aged individuals showed significant improvement in BRF as they aged for 2 of the 5 EVIs. With ImPACT's per-test pricing scheme, it can become quickly become quite expensive to administer adequate baseline tests to athletes at a given school.

Going back to the point about comparing to normative data when no baseline is available, it’s hard to believe that it would be accurate based on the graph above. Abeare et al. (2018) found that overall across all age groups, 55.7% of healthy subjects sampled failed at least one of four indicators of concussion. Thus, “normative” data is only normal for less than half of the population! Comparison to the “Default ImPACT” EVI (bottom line) would yield fewer false positive results, but that also means that the false negative rate would be higher as well.

As for BESS, we found that 59% of college athletes tested within 1 day post-concussion scored at least as well as they did on their baseline test. In fact, on average, athletes scores 6% better on the test immediately after sustaining a concussion.

These staggering statistics could be due to the high inter- and intra- rater variability of the test. One study (Finnoff et al. 2009) found that the minimum detectible change in total BESS score was 9.4 points when different raters are scoring the same athlete at the same time, and 7.3 points when the same rater is scoring the same athlete in different test sessions. Assuming the same Athletic Trainer scored the baseline and post-concussion test for each of the athletes in our dataset, only 2 of 51 athletes could be conclusively classified as concussed on Day 1 following a head injury. Even with a baseline test it is extremely unlikely that a concussion will be correctly diagnosed using BESS.

How is Brain Gauge Different?
Fortunately, there is a better way to assess concussions without the need of a baseline test and without the other complications listed above. One of the strengths of using a Brain Gauge in concussion testing is that “baseline” scoring can be done post-injury—instead of comparing results of the same test pre- and post- injury, comparisons can be made between two different (but similar) tests done in the same session, and the ratio of scores is an indicator of injury. More details can be found here, but briefly, an injury to the brain can cause alterations in lateral inhibition, functional connectivity, feed-forward inhibition, and neuroinflammation. Each of these can be measured with the Brain Gauge by comparing two conditions of amplitude discrimination, TOJ, threshold, and duration discrimination, respectively.

Wait, wouldn’t that ratio be a “normative value” and didn’t we just say that there were problems with comparing to normative results? Yes, but there are differences here. Instead of being based on subjective psycho-social factors like ImPACT, or very subjective scoring as with BESS, Brain Gauge scores are biologically based objective measures of how well different areas of your brain are functioning and communicating. Thus our “normative” data is based on optimal neurological functioning, and while other pre-existing conditions (i.e. migraines) may confound Brian Gauge results post-concussion, we’ve found that different populations generally group together in terms of overall corticalmetrics score (see previous post here). Baseline testing in that instance could be useful, but again, is not necessary. As stated above, athletes in general are much more motivated to perform well when incentivized with returning to play rather than a baseline test, and while most Brain Gauge tests are difficult to intentionally perform poorly on, we have observed healthy athletes immediately cutting their Reaction Time score in half when given an incentive.

Direct comparison of Brain Gauge results with ImPACT. The two graphs at the top display two of the ImPACT measures – the symptom score and the cognitive efficiency index – from a study that we completed. Note that concussed individuals took approximately 7 days to regurn to baseline on the symptom score but did not show a significant decline on the efficiency index for the second metric. A plot of one of the Brain Gauge scores, the lateral inhibition metric, is displayed on the bottom left and the overall corticalmetric is plotted on the bottom right. In both cases, values remain well above baseline for 21-28 days, which fits very well with physiological data from both human and animal concussion studies. Thus, the Brain Gauge appears to be a bit more sensitive in tracking recovery from concussion.

No test is going to be 100% accurate 100% of the time since everyone is different and has their own set of confounding factors. However, some tests are better than others. Since the Brain Gauge is based on biological functioning of your brain and gives multiple quantifiable scores of brain health, you are better able to escape the subjectivity of other commonly used concussion assessment tests.

References
Abeare CA, Messa I, Zuccato BG, Merker B, Erdodi L. Prevalence of invalid performance on baseline testing for sport-related concussion by age and validity indicator. JAMA neurology. 2018 Mar 12.

Finnoff JT, Peterson VJ, Hollman JH, Smith J. Intrarater and interrater reliability of the Balance Error Scoring System (BESS). Pm&r. 2009 Jan 1;1(1):50-4.