Share This Page
Before you read any further, grab a glass of your favorite beverage and set it down (no drinking yet). Done? Now, reach both hands around your back and touch your pinkies together. Then quickly take a sip of your drink. Go on.
Did you do it? If so, the next time you find yourself in a similar environment you will have a greater chance of spontaneously repeating this round-the-back pinkie act. Although that possibility may sound strange, your brain is actually programmed to reinforce actions that are immediately followed by rewards. This is especially true when the reward is unexpected (you probably did not expect to have a treat when you began to read this article).
Although most of us feel like we are in control of our actions, many of those actions can also be explained by principles of learning that are embedded in our neural machinery. Of course, this machinery is inordinately intricate and complex, involving several interacting systems, each with millions of neurons, billions of connections, and multiple neurotransmitters, all evolving dynamically as a function of genes, time, past experience, and current environment. But neuroscience is shedding light on how circuits linking two parts of the brain, the basal ganglia and the frontal cortex, contribute to learning both productive and counterproductive behaviors, and even to some neurological disorders. Those circuits can, for example, help account for genetically driven individual differences in whether we learn best from positive or negative reinforcement, and understanding them provides insights into decision making in people with Parkinson’s disease, attention-deficit hyperactivity disorder, and addictions.
Basal Ganglia Basics
The basal ganglia are a collection of interconnected areas deep below the cerebral cortex. They receive information from the frontal cortex about behavior that is being planned for a particular situation. In turn, the basal ganglia affect activity in the frontal cortex through a series of neural projections that ultimately go back up to the same cortical areas from which they received the initial input. This circuit enables the basal ganglia to transform and amplify the pattern of neural firing in the frontal cortex that is associated with adaptive, or appropriate, behaviors, while suppressing those that are less adaptive. The neurotransmitter dopamine plays a critical role in the basal ganglia in determining, as a result of experience, which plans are adaptive and which are not.
Evidence from several lines of research supports this understanding of the role of basal ganglia and dopamine as major players in learning and selecting adaptive behaviors. In rats, the more a behavior is ingrained, the more its neural representations in the basal ganglia are strengthened and honed.1 Rats depleted of basal ganglia dopamine show profound deficits in acquiring new behaviors that lead to a reward. Experiments pioneered by Wolfram Schultz, M.D., Ph.D., at the University of Cambridge have shown that dopamine neurons fire in bursts when a monkey receives an unexpected juice reward.2 Conversely, when an expected reward is not delivered, these dopamine cells actually cease firing altogether, that is, their firing rates “dip” below what is normal. These dopamine bursts and dips are thought to drive changes in the strength of synaptic connections—the neural mechanism for learning—in the basal ganglia so that actions are reinforced (in the case of dopamine bursts) or punished (in the case of dopamine dips).
In 1996, Read Montague, Ph.D., and colleagues showed that these patterns of dopamine firing bear a striking resemblance to learning signals developed independently by artificial intelligence researchers.3 The researchers created a program that would “train” a computer to discover on its own the best sequence of actions needed for the computer program to obtain a simulated reward, such as an endpoint in an artificial maze. The program tries to “predict” when rewards are likely. When this prediction is wrong, the resulting dopaminelike “prediction errors” are used as a learning signal to improve future predictions, which are, in turn, used to modify subsequent actions by the computer. The same computer program has been used for purposes as diverse as learning an optimal strategy for playing computerized backgammon and accounting for foraging behavior in honeybees.
At this point you might be thinking, “Okay, those are rats, monkeys, bees, and automated computer programs. We humans are much more sophisticated than that.” But in truth, ample evidence can be found to show that these very same principles of positive and negative reinforcement are relevant in humans—although other cognitive systems complement and can sometimes override this primitive system. Using functional neuroimaging, Samuel McClure, Ph.D., and colleagues at Princeton University showed that humans activated reward areas of the basal ganglia, which are heavily enriched with dopamine, when receiving unexpected rewards. 4 This was true regardless of whether humans received concrete rewards, such as juice (as in the monkey studies), or more-abstract rewards, such as money. Others have since found that the same basal ganglia areas were activated even when study participants simply received visual feedback informing them whether they were correct or incorrect in a cognitive task. Further evidence comes from a 1996 study by Barbara Knowlton, Ph.D., and her colleagues at the University of California, Los Angeles. They learned that people with Parkinson’s disease, whose basal ganglia dopamine levels are severely depleted as a result of cell death, show specific deficits in exactly this kind of trial-and-error learning from feedback. Thus humans, too, recruit their “primitive” reinforcement learning system in the basal ganglia to support behavior in more-complex cognitive tasks.
How Do the Basal Ganglia Learn?
While these studies and other evidence point to the critical role of the basal ganglia dopamine system in learning about the consequences of one’s actions, it is another question altogether to ask “how.” Unfortunately, any medical resident or neuroscience graduate student will tell you that the circuitry linking various neural subregions that collectively form the basal ganglia is so complex and seemingly convoluted that trying to piece the puzzle together can make your head spin.
That problem is precisely why the development of computer models is essential. These models enable researchers to simulate various anatomical and physiological pieces of data, using mathematical equations that capture how groups of neurons communicate activity to other neurons within and between brain areas. By incorporating aspects of neuronal physiology and connectivity that are specific to the basal ganglia into a computer model, we can examine what happens when all of this is put together and the computer model is allowed to evolve dynamically as a result of the input it receives.
Many attempts have been made to model basal ganglia function. Although the models have tackled different levels of analysis, from molecular-level to systems-level interactions, several of them have converged on the same core idea: that the architecture of this seemingly convoluted system is particularly well suited to support “action selection”—that is, to implicitly weigh all available options for what to do next and to choose the best one. Intriguingly, the “actions” that can be selected range from simple motor behaviors (for example, touching your pinkies together) to manipulation of information in memory, such as multiplying 42 by 17 in your head. Although these seem like fundamentally different problems, they share the core abstract action selection problem. That is, just as we have to select a single motor action (or sequence of actions) to “beat out” all other possible actions, we also have to know which piece of information is relevant to update and store into memory so that it takes precedence over other, potentially distracting thoughts.
It seems, then, that our ability to think, reason, and manipulate memories evolved from similar mechanisms that allow an animal to perform impressive sequences of motor actions, like when a bird swoops down to catch a fish. The key difference between these cognitive and motor functions may lie in the specializations of the different regions of the frontal cortex and the actions that each encodes. For instance, motor actions are encoded in the motor cortex, whereas a prefrontal cognitive action involves updating information to be actively held in your mind for a period of time, while you continue to process other incoming sensory information. Notably, the circuits that link parts of the basal ganglia to motor cortical areas are structurally identical to those linking other parts of the basal ganglia to the regions of the prefrontal cortex that are used for cognitive processes. Thus the basal ganglia can play a similar role in selecting among both motor and cognitive actions, by interacting with different parts of the frontal cortex.
Finding “Go” and “NoGo”
Building on a large body of earlier theoretical work, my colleagues and I developed a series of computational models that explore the role of the basal ganglia when people select motor and cognitive actions. We have been focusing on how dopamine signals in the basal ganglia, which occur as a result of positive and negative outcomes of decisions (that is, rewards and punishments), drive learning. This learning is made possible by two main types of dopamine receptors, D1 and D2, which are associated with two separate neural pathways through the basal ganglia.5 When the “Go” pathway is active, it facilitates an action directed by the frontal cortex, such as touching your pinkies together. But when the opposing “NoGo” pathway is more active, the action is suppressed. These Go and NoGo pathways compete with each other when the brain selects among multiple possible actions, so that an adaptive action can be facilitated while at the same time competing actions are suppressed. This functionality can allow you to touch your pinkies together, not perform another potential action (such as scratching an itch on your neck), or to concentrate on a math problem instead of daydreaming.
But how does the Go/NoGo system know which action is most adaptive? One answer, we think (and as you might have guessed), is dopamine. During unexpected rewards, dopamine bursts drive increased activity and changes in synaptic plasticity (learning) in the Go pathway. When a given action is rewarded in a particular environmental context, the associated Go neurons learn to become more active the next time that same context is encountered. This process depends on the D1 dopamine receptor, which is highly concentrated in the Go pathway. Conversely, when desired rewards are not received, the resulting dips in dopamine support increases in synaptic plasticity in the NoGo pathway (a process that depends on dopamine D2 receptors concentrated in that pathway). Consequently, these nonrewarding actions will be more likely to be suppressed in the future.
My colleagues and I demonstrated that a computer model of these Go/NoGo signals (and the spread of these signals through the rest of the basal ganglia circuit) can learn to produce actions that are most likely to lead to reward in the long run.6 Similarly, the same dopamine reinforcement learning processes can be extended to reinforce cognitive actions that are essential intermediate steps, for example, doing the arithmetic that is needed to achieve the longer-term goal of preparing your taxes. The process also “punishes” distracting thoughts (“What’s for dinner?”), allowing you to stay on task. And it enables complex cognitive working memory operations, such as remembering the figures as you multiply them, to be executed more swiftly and efficiently with practice.
Positive and Negative Learners
This theoretical framework, which integrates anatomical, physiological, and psychological data into a single coherent model, can go a long way in explaining changes in learning, memory, and decision making as a function of changes in basal ganglia dopamine. In particular, this model makes a key, previously untested, prediction that greater amounts of dopamine (via D1 receptors) support learning from positive feedback, whereas decreases in dopamine (via D2 receptors) support learning from negative feedback.
To test these ideas, we developed a computer “game” that requires learning from both positive and negative decision outcomes. We tested healthy college students who were given low doses of three different drugs: a drug that enhances the release of dopamine, a drug that reduces the release of dopamine, and a placebo; each student was tested in each of the three conditions. Participants in the study viewed pairs of symbols on a computer screen and were told to choose one of the symbols in the pair by pressing a left or right key on a keyboard. We gave the students no explicit rule for knowing which symbols to select, but after they made their choice they did receive feedback that told them whether the choice was right or wrong, so that they could learn from trial and error.
The trick was that the feedback was somewhat random—it was not always the same for each choice—so it was impossible always to make the right choice. In the most reliable pair, “AB,” choosing symbol A led to positive feedback on 80 percent of trials, whereas choosing symbol B led to 80 percent negative feedback. The results of other choices (“CD” and “EF”) were less consistent and more random. We hypothesized that all participants would learn to choose A over B, but that they would do so on different bases, depending on which drug they had been given. When dopamine levels are elevated, we hypothesized, participants should learn to choose symbol A, which had received the most positive feedback (that is, they should learn “Go” to A). But they should be relatively impaired in learning to avoid (NoGo) symbol B: the drug-induced elevations in dopamine would prevent dopamine dips that would normally support this negative feedback learning. In contrast, the drug that reduces dopamine levels should lead to reduced Go learning but relatively enhanced NoGo learning for avoiding symbol B.
To distinguish between these choose-A and avoid-B learning strategies, we conducted an additional test phase, in which participants faced the choice of new combinations. Symbol A was re-paired with other, more neutral symbols, and in other trials, symbol B was paired with these same symbols. Participants were told to just use “gut-level” intuition based on their prior learning to make these new choices, and no feedback was provided. We reasoned that to the extent that an individual had learned Go to A, he would reliably choose symbol A in all test pairs in which it was present. Conversely, if he instead learned NoGo to B, he would more reliably avoid symbol B in all test pairs in which it was present.
We found a striking effect of the different dopamine medications on this positive versus negative learning bias, consistent with predictions from our computer model of the learning process. While on placebo, participants performed equally well at choose-A and avoid-B test choices. But when their dopamine levels were increased, they were more successful at choosing the most positive symbol A and less successful at avoiding B. Conversely, lowered dopamine levels were associated with the opposite pattern: worse choose-A performance but more-reliable avoid-B choices. Thus the dopamine medications caused participants to learn more or less from positive versus negative outcomes of their decisions
These research discoveries raise the intriguing question of whether individual differences in learning from positive versus negative outcomes of decisions can be found even in nonmedicated healthy people. Indeed, although on average our study participants taking the placebo showed roughly equal choose-A and avoid-B performance, individual participants still performed better at one or the other; we refer to these subgroups as positive or negative learners. An initial study showed that these learners differed in the extent to which their brains responded to reinforcement feedback, as measured by brain electrical activity. Negative learners showed greater neural sensitivity to negative feedback. In principle, it is possible that these behavioral learning biases, and their neural correlates, arise from a combination of psychological, cultural, and experiential factors. Still, we reasoned that they could (at least in part) stem from individual differences in basal ganglia dopamine function, which in turn may be controlled by genetics.
Modern genetic techniques enable us to identify individual genes that specifically control basal ganglia dopamine. We therefore collected DNA (using a simple salivary cheek swab) from 69 college students who were tested with the same positive/negative learning procedure.7 We looked for a gene coding for DARPP-32, a protein known to control basal ganglia dopamine efficacy and previously shown to mediate dopamine D1 receptor effects on synaptic plasticity in animals. Notably, the presence of a common mutation in this gene accounted for a substantial proportion of the relative positive versus negative learning biases in our study participants. Furthermore, a mutation in another gene previously shown to control the density of (NoGo) D2 receptors in the basal ganglia predicted the extent to which participants learned from negative decision outcomes. Together, these results provide more-specific confirmation of our model’s suggestion that Go and NoGo learning depends on the D1 and D2 receptors.
How Might This Apply in Parkinson’s Disease?
What does this Go and NoGo learning framework suggest for people with basal ganglia dysfunction? Parkinson’s disease offers one of the clearest instances of such dysfunction, since the loss of dopamine in the basal ganglia of people with the disease is well understood. Parkinson’s therefore provides a good opportunity to test whether the model’s account for how these brain systems learn and decide is plausible, and suggests implications for cognitive deficits stemming from the systems’ malfunction.
In a seminal paper by Barbara Knowlton (mentioned earlier) and her colleagues, people with Parkinson’s were tested with a “weather prediction” task. In this experiment, patients and healthy participants were given four decks of playing cards and were told to guess whether the particular combination of cards in front of them would predict “rain” or “sun.” After each guess, feedback let them know whether or not they were correct. The relationship between the cards and the “weather” outcome was complex; no simple rule determined whether any particular card combination would lead to rain or sun, and because the feedback was somewhat random, it was impossible always to predict correctly (much like the actual weather).
Despite being unable to explicitly state the basis of their choices, healthy study participants implicitly integrated the reinforcement feedback over multiple trials and became progressively more accurate at predicting sun or rain. In contrast, people with Parkinson’s showed very little evidence of this implicit learning. Subsequent studies demonstrated that this difficulty did not occur if the Parkinson’s patients were instead shown the correct answer in every trial and did not have to learn from the consequences of their actions. While it is often assumed that Parkinson’s disease affects only motor function, this and several other recent studies confirm that Parkinson’s disease is a complex neuropsychiatric condition that clearly has cognitive effects.
To learn to discriminate between subtly different action-outcome contingencies, the basal ganglia appear to require a healthy range of dopamine bursts and dips to support both Go and NoGo learning. Our model suggests that deficits in implicit learning, such as in the weather prediction study, result from a reduced range of dopamine signals. Indeed, when Parkinson’s disease was simulated in our computer model (by reducing the level of dopamine in the simulated basal ganglia), the model showed impaired learning in the weather prediction task similar to what was seen in the Parkinson’s patients.6
One might, then, logically predict that these cognitive learning deficits would be improved by medications that elevate brain dopamine. But while such medication does improve some aspects of cognition, it can actually further impair or even cause some types of learning deficits. This counterintuitive finding is naturally explained by our basal ganglia model. These medications artificially elevate brain dopamine levels and improve motor deficits of the disease, by shifting the balance from too much NoGo to more Go.5 But this same effect can prevent patients from learning from negative feedback: the dips of dopamine required to learn NoGo are effectively “filled in” by the medication. In essence, the medication prevents the brain from naturally and dynamically regulating its own dopamine levels, which has a detrimental effect on learning, particularly when dopamine levels should be low, as for negative decision outcomes. This notion might explain why some medicated Parkinson’s patients develop pathological gambling behaviors, which could result from enhanced learning from gains together with an inability to learn from losses.
To test this idea, we presented people with Parkinson’s disease with the same choose-A/avoid-B learning task once while they were on their regular dose of dopamine medication and another time while off it.8 Consistent with what we predicted, we found that, indeed, patients who were off the medication were relatively impaired at learning to choose the most positive stimulus A, but showed intact or even enhanced learning of avoid-B. Dopamine medication reversed this bias, improving choose-A performance but impairing avoid-B. This discovery supports the idea that medication prevents dopamine dips during negative feedback and impairs learning based on negative feedback.
The Basal Ganglia in Other Disorders
Given this research on Parkinson’s disease and the genetic connection with positive and negative learning that we identified, we can be fairly confident that the positive and negative learning biases we have measured are basal ganglia–dependent. This understanding enables us to explore how the same process might play out in people with other disorders that include problems with the basal ganglia, even if those other disorders have more than one neurological underpinning. For example, in attention deficit hyperactivity disorder (ADHD), neurobiological research has consistently implicated dopamine deficiency, and stimulant medications such as Ritalin, often used to treat ADHD, act by directly increasing basal ganglia dopamine.
We administered the same computerized learning task that we had used in people with Parkinson’s disease to adults diagnosed with ADHD to determine whether dopamine dysfunction could explain their motivational deficits. Again, study participants came into the lab twice, once when they were off medication and once after taking their regular dose of stimulant medications. In the off-medication state, ADHD participants showed a global reduction in both choose-A and avoid-B performance, compared with healthy control participants.
Of course, we could chalk up this global deficit to other factors, including a simple lack of attention or motivation to perform well. But the medication manipulation was more informative: stimulants dramatically enhanced choose-A performance by an average of 15 percent (making it identical to that of healthy controls), while having no effect at all on avoid-B performance, which remained at near-chance levels. We further showed that the extent to which medications improved positive, relative to negative, feedback learning was correlated with improvements in other aspects of higher-level cognition, including the ability to pay attention to task-relevant information while ignoring distracting information.
Moreover, these same principles can be applied to understanding addiction. Researchers now agree that the majority of drugs of abuse act by hijacking the natural reward system. When an addict snorts cocaine or smokes a cigarette, he not only experiences a drug-induced high (“reward”), but the associated dopamine bursts act to further stamp these destructive behaviors into his brain so that they are more likely to be repeated.
But it is even worse than that. As I mentioned earlier, dopamine signals are primarily associated with unexpected rewards. Normally, as people come to expect a reward, an adaptive process prevents dopamine bursts from occurring when the reward is actually delivered. This process prevents rewarding behaviors from being “overlearned,” possibly so that they can be overridden or unlearned if their consequences change for the worse. Unfortunately, drugs of abuse bypass the circuitry that would normally enable such discounting of expected rewards, and directly elevate dopamine levels.9 Consequently, maladaptive drug-taking behavior is continually strengthened, making it particularly difficult to override. This enhanced stimulus-response learning also may explain the high rate of relapse in recovering addicts who encounter cues associated with taking drugs.
Negative learners may show relatively harmless traits, such as being generally conservative and avoiding risky menu items at a restaurant. But, at the extreme, a focus on errors and negative feedback can lead to obsessiveness and hyperperfectionism. Some evidence from neuroimaging studies suggests that people with obsessive-compulsive disorder (OCD) have a hyperactive error-monitoring system, as revealed by the activity in their brain when they are making mistakes in a cognitive task. But although some evidence for basal ganglia pathology in this disorder has been found, it is not clear why in some cases an overactive error system leads to avoidance of maladaptive behaviors, while in OCD, people tend to repeat the same behaviors again and again. More-elaborate neurobiological, computational, and psychological studies are needed to decipher this and other conundrums associated with maladaptive behaviors in more-complex disorders, including OCD and schizophrenia.
The Power of Implicit Learning
Taken together, this research on the basal ganglia’s role in learning raises some provocative—and, to some, perhaps frightening—possibilities. For example, should grade school children be genotyped to determine whether they are more likely to benefit from teaching that emphasizes positive or negative feedback? We know that underachieving students can thrive when placed in a different learning environment. Attention to individual genetic differences, based on the science, could enhance the probability of success by developing a foundation for determining which students are most likely to succeed in which environment. Similarly, if people with ADHD who are receiving a stimulant medication learn well from positive feedback but not negative feedback, that response may imply that they should be motivated to achieve their goals through positive reinforcement. Giving them negative feedback following maladaptive behavior, or threatening punishment for poor behavior, may simply not be a worthwhile strategy.
Many human behaviors can be understood from the perspective of reinforcement learning, even though it does not entirely account for more-complex behaviors. The emerging scientific consensus is that the more-primitive learning system, centered in the basal ganglia, may dictate more of our choices than we usually like to think. The fundamental principles governing action selection and reinforcement in the basal ganglia can also be extended to explain aspects of higher-order decisions and working memory. Indeed, one can think of these more advanced biological circuits as also reflecting a Go/NoGo decision. The choice in this case is between an action directed by the basal ganglia’s implicit learning system or one that is a result of more elaborate conscious processing in prefrontal areas of the brain, which can override the more primitive basal ganglia system.
If a common reaction is that what is effectively a series of neuronal reflexes could never do justice to the full glory of human thought and action, then perhaps, as my graduate neuroanatomy professor once said, people just don’t have enough respect for reflexes. And nothing can show this more readily than a failure of the system, as in neurological disorders.
1. Jog, MS, Kubota, Y, and Graybiel, AM. Building Neural Representations of Habits. Science 1999; 286: 1745.
2. Schultz, W. Getting Formal with Dopamine and Reward. Neuron 2002; 36: 241–263.
3. Montague, PR, Dayan, P, and Sejnowski, TJ. A Framework for Mesencephalic Dopamine Systems Based on Predictive Hebbian Learning. Journal of Neuroscience 1996; 16: 1936–1947.
4. Seger, CA, and Cincotta, CM. The Roles of the Caudate Nucleus in Human Classification Learning. Journal of Neuroscience 2005; 25: 2941–2951.
5. Mink, JW. The Basal Ganglia: Focused Selection and Inhibition of Competing Motor Programs. Progress in Neurobiology 1996; 50: 381–425.
6. Frank, MJ. Dynamic Dopamine Modulation in the Basal Ganglia: A Neurocomputational Account of Cognitive Deficits in Medicated and Non-medicated Parkinsonism. Journal of Cognitive Neuroscience 2005; 17: 51–72.
7. Frank, MJ, Moustafa, AA, Haughey, H, Curran, T, and Hutchison, K. Genetic Triple Dissociation Reveals Multiple Roles for Dopamine in Reinforcement Learning. Proceedings of the National Academy of Sciences. 2007;104: 16311-16316.
8. Frank, MJ, Seeberger, LC, and O’Reilly, RC. By Carrot or by Stick: Cognitive Reinforcement Learning in Parkinsonism. Science 2004; 306: 1940–1943.
9. Redish, AD. Addiction as a Computational Process Gone Awry. Science 2004; 306: 1944–1946.